Skip to content

Missing fulltext but flag positives #91

@lfoppiano

Description

@lfoppiano

While running appendFulltextTei, I've encountered this exception:

[ERROR] fr.inria.anhalytics.commons.managers.MongoFileManager: No corresponding fulltext TEI was found for BiblioObject{anhalyticsId=5b2b35c82d0eac1ef1ff03ad,metadataURL=https://hal.archives-ouvertes.fr/hal-01789422/tei,metadata=,teiCorpus=,doi=10.2516/ogst/2018004,publicationType=ART_Journal articles,isWithFulltext=true,domains=[ "phys_Physics [physics]"]}
fr.inria.anhalytics.commons.exceptions.DataException: java.lang.NullPointerException
	at fr.inria.anhalytics.commons.managers.MongoFileManager.getTei(MongoFileManager.java:329)
	at fr.inria.anhalytics.commons.managers.MongoFileManager.getGrobidTei(MongoFileManager.java:311)
	at fr.inria.anhalytics.harvest.teibuild.TeiBuilderWorker.run(TeiBuilderWorker.java:105)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
	at fr.inria.anhalytics.commons.managers.MongoFileManager.getTei(MongoFileManager.java:325)
	... 5 more

however the data in mongodb says otherwise (and the fulltext is in HAL):

> db.biblio_objects.findOne({anhalyticsId: '5b2b35c82d0eac1ef1ff03ad'})
{
	"_id" : ObjectId("5b2b35c82d0eac1ef1ff03ad"),
	"anhalyticsId" : "5b2b35c82d0eac1ef1ff03ad",
	"repositoryDocId" : "hal-01789422",
	"source" : "hal",
	"metadataURL" : "https://hal.archives-ouvertes.fr/hal-01789422/tei",
	"publicationType" : "ART_Journal articles",
	"repositoryDocVersion" : "v1",
	"doi" : "10.2516/ogst/2018004",
	"domains" : [
		"phys_Physics [physics]"
	],
	"isWithFulltext" : true,
	"isFulltextAppended" : true,
	"isProcessedPub2TEI" : true,
	"isMined" : false,
	"isIndexed" : false
}

Any idea where the issue might be?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions