You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Set environment variable DIFFBOT_TOKEN if you want to use entitylinking. We are providing cached results for the KnowledgeNet documents, but you will need this if you want to run the system on other documents or if you want to change the NER system. Contact Filipe Mesquita (filipe[at]diffbot.com) for a free research token.
9
+
10
+
## Using the pretrained model
11
+
12
+
Get the release from https://github.com/diffbot/knowledge-net/releases which includes the pretrained baseline 5 model, vocab, and linking and wikidata cache.
13
+
14
+
Run on a single document:
15
+
16
+
`echo "Butler W. Lampson (born December 23, 1943) is an American computer scientist contributing to the development and implementation of distributed, personal computing. He is a Technical Fellow at Microsoft and an Adjunct Professor at MIT." | python run.py`
17
+
18
+
output
19
+
```
20
+
Butler W. Lampson| DATE_OF_BIRTH(0.99) December 23, 1943|
21
+
Butler W. Lampson|http://www.wikidata.org/entity/Q92644 NATIONALITY(0.88) American|http://www.wikidata.org/entity/Q30
Butler W. Lampson|http://www.wikidata.org/entity/Q92644 EMPLOYEE_OR_MEMBER_OF(0.56) Microsoft|http://www.wikidata.org/entity/Q2283
28
+
Butler W. Lampson|http://www.wikidata.org/entity/Q92644 EMPLOYEE_OR_MEMBER_OF(0.69) MIT|http://www.wikidata.org/entity/Q49108
29
+
```
30
+
31
+
## Evaluating
32
+
33
+
`python evaluate.py [test or dev]`
34
+
35
+
This creates the analysis files in `tmp` and when run on `dev` prints the results. To preserve the integrity of the results, we have released the test set without annotations. See https://github.com/diffbot/knowledge-net#adding-a-system-to-the-leaderboard for more details.
36
+
37
+
With the pretrained baseline 5 model you should get similar to the following on the dev set
38
+
```
39
+
Evaluation Precision Recall F1
40
+
span_overlap 0.718 0.691 0.704
41
+
span_exact 0.620 0.599 0.609
42
+
uri 0.557 0.472 0.511
43
+
```
44
+
45
+
## Training
46
+
47
+
Choose which model you would like to train in config.py
48
+
49
+
Warning: baseline 5 requires ~300GB of disk space to train. The others require much less.
50
+
51
+
`./train.sh`
52
+
53
+
## Troubleshooting
54
+
55
+
`spacy.strings.StringStore size changed error`
56
+
57
+
If you have an error mentioning spacy.strings.StringStore size changed, may indicate binary incompatibility when loading NeuralCoref with import neuralcoref, it means you'll have to install NeuralCoref from the distribution's sources instead of the wheels to get NeuralCoref to build against the most recent version of SpaCy for your system.
58
+
59
+
In this case, simply re-install neuralcoref as follows:
#print("Error no output for word", len(output[sentence_index]), batch_orig_to_tok_map[sentence_index][word_index], batch_orig_to_tok_map[sentence_index][word_index+1])
0 commit comments