-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagger escapes at Stream index: 34160469 #10
Comments
@petulla it might be worth checking the Solr logs for any errors there? |
This is the error.. Any ideas? The same document id throws the error every time.
Full read out:
|
Hmm, it looks like we are running into a hard-coded bound on the size of the index here, not sure if we can do much about it! We probably need to report that upstream to Solr. I haven't got much time to investigate this right now though. If you want a quick fix, try narrowing down the scope of the profile (by selecting smaller classes of Wikidata items to include), which should decrease the size of the index and hopefully avoid this bug. Sorry that I cannot give a more satisfactory fix! |
Hm. So the file like I'm confused because I assumed you had run this on the full wikipedia dataset. |
I have indeed run this on the full Wikidata dump, but that was a while ago now and Wikidata grows all the time, so it is totally possible that this error appeared in the mean time. Yes, I would change |
Can try just running a recent dump and seeing if it works for you? I'm trying Facebook's recent NEL codebase now but may need to return to this and am concerned fixing may take several hours at minimum. |
I do intend to re-run this myself on a recent dump in the coming months, I will report back here once this is done. |
I raised another issue related to this. I can't get past training on index 34160469. The
tapioca index-dump wiki_collection latest-all.json.bz2 --profile profiles/human_organization_place.json
step falls out at this point every time. Any idea what might be happening? The previous steps ended successfully. Is there pre-trained model I can use to supplement any of the steps for testing?
Solr 8.2
Python 3.7.4
Mac OS Mojave
The text was updated successfully, but these errors were encountered: