Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler stuck on nutch InjectorJob #4

Open
havardthom opened this issue Oct 23, 2018 · 1 comment
Open

Crawler stuck on nutch InjectorJob #4

havardthom opened this issue Oct 23, 2018 · 1 comment

Comments

@havardthom
Copy link

Hi, I just installed this crawler and I'm having an issue. Testing the crawler with just one URL and it seems to get stuck on the nutch InjectorJob, nothing happens after the following:

[nutch-indexer-discovery]$ ./crawl
Injecting urls from ./seed/urls.txt
./build/apache-nutch-2.3.1/runtime/local/bin/nutch inject ./seed/urls.txt
InjectorJob: starting at 2018-10-23 13:13:36
InjectorJob: Injecting urlDir: seed/urls.txt

Installation and setup went fine, except some warning when I ran ./gradlew buildPlugin:

[ant:taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

Any idea what might be wrong here?

@havardthom
Copy link
Author

So it's not stuck, just very very slow. 2 hours to inject one url..
currently at this stage:

./build/apache-nutch-2.3.1/runtime/local/bin/nutch inject ./seed/urls.txt
InjectorJob: starting at 2018-10-23 13:19:22
InjectorJob: Injecting urlDir: seed/urls.txt
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2018-10-23 15:21:13, elapsed: 02:01:50
Generate urls: 
./build/apache-nutch-2.3.1/runtime/local/bin/nutch generate -topN 5
GeneratorJob: starting at 2018-10-23 15:21:14
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant