|
| 1 | +# getty-entity-lookup |
| 2 | + |
1 | 3 |  |
2 | 4 |
|
3 | 5 | [](https://travis-ci.org/cwrc/getty-entity-lookup) |
|
9 | 11 | [](http://commitizen.github.io/cz-cli/) |
10 | 12 | [](http://github.com/badges/stability-badges) |
11 | 13 |
|
12 | | -# getty-entity-lookup |
13 | | - |
14 | 14 | 1. [Overview](#overview) |
15 | 15 | 1. [Installation](#installation) |
16 | 16 | 1. [Use](#use) |
17 | 17 | 1. [API](#api) |
18 | 18 | 1. [Development](#development) |
19 | 19 |
|
20 | | -### Overview |
| 20 | +## Overview |
21 | 21 |
|
22 | | -Finds entities (people, places) in getty. Meant to be used with [cwrc-public-entity-dialogs](https://github.com/cwrc-public-entity-dialogs) where it runs in the browser. |
| 22 | +Finds entities (people, places) in getty. Meant to be used with [cwrc-public-entity-dialogs](https://github.com/cwrc-public-entity-dialogs) where it runs in the browser. |
23 | 23 |
|
24 | 24 | Although it will not work in node.js as-is, it does use the [Fetch API](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) for http requests, and so could likely therefore use a browser/node.js compatible fetch implementation like: [isomorphic-fetch](https://www.npmjs.com/package/isomorphic-fetch). |
25 | 25 |
|
26 | 26 | ### SPARQL |
27 | 27 |
|
28 | | -getty supports sparql, but SPARQL has limited support for full text search. The expectation with SPARQL mostly seems to be that you know exactly what you are matching on |
29 | | -So, a query that exactly details the label works fine: |
| 28 | +getty supports sparql, but SPARQL has limited support for full text search. The expectation with SPARQL mostly seems to be that you know exactly what you are matching on. So, a query that exactly details the label works fine: |
30 | 29 |
|
| 30 | +```sql |
31 | 31 | SELECT DISTINCT ?s WHERE { |
32 | 32 | ?s ?label "The Rolling Stones"@en . |
33 | 33 | ?s ?p ?o |
34 | 34 | } |
| 35 | +``` |
35 | 36 |
|
36 | | -We'd like, however, to match with full text search, so we can match on partial strings, variant spellings, etc. |
37 | | -Just in the simple case above, for example, someone searching for The Rolling Stones would have to fully specify 'The Rolling Stones' and not just 'Rolling Stones'. If they left out 'The' then their query won't return the result. |
38 | | - |
39 | | -There is a SPARQL CONTAINS operator that can be used within a FILTER, and that matches substrings, which is better, and |
40 | | -CONTAINS seems to work with getty, e.g. |
| 37 | +We'd like, however, to match with full text search, so we can match on partial strings, variant spellings, etc. Just in the simple case above, for example, someone searching for The Rolling Stones would have to fully specify 'The Rolling Stones' and not just 'Rolling Stones'. If they left out 'The' then their query won't return the result. |
41 | 38 |
|
| 39 | +There is a SPARQL CONTAINS operator that can be used within a FILTER, and that matches substrings, which is better, and CONTAINS seems to work with getty, e.g. |
42 | 40 |
|
43 | | -``` |
| 41 | +```text |
44 | 42 | http://vocab.getty.edu/sparql.json?query=SELECT DISTINCT ?s ?label WHERE { |
45 | | - ?s rdfs:label ?label . |
46 | | - FILTER (CONTAINS (?label,"Rolling Stones")) |
| 43 | + ?s rdfs:label ?label . |
| 44 | + FILTER (CONTAINS (?label,"Rolling Stones")) |
47 | 45 | ``` |
48 | 46 |
|
49 | 47 | but again, CONTAINS only matches substrings. |
50 | 48 |
|
51 | | -There is at least one alternative to CONTAINS - REGEX - but as described |
52 | | -here: https://www.cray.com/blog/dont-use-hammer-screw-nail-alternatives-regex-sparql/ REGEX has even worse performance than CONTAINS. |
53 | | - |
54 | | -A further alternative, which we've adopted, is the |
55 | | -custom full text SPARQL search function through which Getty exposes it's underlying lucene index, as described here: |
56 | | - |
57 | | -http://vocab.getty.edu/doc/queries/#Full_Text_Search_Query |
58 | | - |
59 | | -and here: |
60 | | - |
61 | | -http://serials.infomotions.com/code4lib/archive/2014/201402/0596.html |
62 | | - |
63 | | -The endpoint does not, however, support HTTPS. And so, we proxy our calls to the lookup through own server: |
64 | | - |
65 | | -```https://lookup.services.cwrc.ca/getty``` |
66 | | - |
67 | | -to thereby allow the CWRC-Writer to make HTTPS calls to the lookup. |
68 | | -We can’t make plain HTTP calls from the CWRC-Writer because the CWRC-Writer may only be |
69 | | -loaded over HTTPS, and any page loaded with HTTPS is not allowed (by many browsers) to make HTTP AJAX calls. |
| 49 | +There is at least one alternative to CONTAINS - REGEX - but as described here: [https://www.cray.com/blog/dont-use-hammer-screw-nail-alternatives-regex-sparql/](https://www.cray.com/blog/dont-use-hammer-screw-nail-alternatives-regex-sparql/) REGEX has even worse performance than CONTAINS. |
70 | 50 |
|
71 | | -We also proxy calls to retrieve the full page description of an entity, again to allow calls out from a page that was itself |
72 | | -loaded with https. The proxy: |
| 51 | +A further alternative, which we've adopted, is the custom full text SPARQL search function through which Getty exposes it's underlying lucene index, as described here: [http://vocab.getty.edu/doc/queries/#Full_Text_Search_Query](http://vocab.getty.edu/doc/queries/#Full_Text_Search_Query) and here: [http://serials.infomotions.com/code4lib/archive/2014/201402/0596.html](http://serials.infomotions.com/code4lib/archive/2014/201402/0596.html) |
73 | 52 |
|
74 | | -```https://getty.lookup.services.cwrc.ca``` |
| 53 | +The endpoint does not, however, support HTTPS. And so, we proxy our calls to the lookup through own server: `https://lookup.services.cwrc.ca/getty` to thereby allow the CWRC-Writer to make HTTPS calls to the lookup. We can’t make plain HTTP calls from the CWRC-Writer because the CWRC-Writer may only be loaded over HTTPS, and any page loaded with HTTPS is not allowed (by many browsers) to make HTTP calls. |
75 | 54 |
|
76 | | -which in turn calls |
| 55 | +We also proxy calls to retrieve the full page description of an entity, again to allow calls out from a page that was itself loaded with https. The proxy:`https://getty.lookup.services.cwrc.ca` which in turn calls `http://vocab.getty.edu` |
77 | 56 |
|
78 | | -```http://vocab.getty.edu``` |
| 57 | +## Installation |
79 | 58 |
|
| 59 | +`npm i getty-entity-lookup` |
80 | 60 |
|
81 | | -### Installation |
| 61 | +## Use |
82 | 62 |
|
83 | | -npm i getty-entity-lookup -S |
| 63 | +`import gettyLookup from 'getty-entity-lookup';` |
84 | 64 |
|
85 | | -### Use |
| 65 | +## API |
86 | 66 |
|
87 | | -const gettyLookup = require('getty-entity-lookup'); |
| 67 | +### findPerson(query) |
88 | 68 |
|
89 | | -### API |
| 69 | +### findPlace(query) |
90 | 70 |
|
91 | | -###### findPerson(query) |
| 71 | +where the `query` argument is an object: |
92 | 72 |
|
93 | | -###### findPlace(query) |
94 | | - |
95 | | - |
96 | | -<br><br> |
97 | | -where the 'query' argument is an object: |
98 | | -<br> |
99 | | - |
100 | | -``` |
| 73 | +```js |
101 | 74 | { |
102 | | - entity: The name of the thing the user wants to find. |
103 | | - options: TBD |
| 75 | + entity: 'The name of the thing the user wants to find.', |
| 76 | + options: 'TBD' |
104 | 77 | } |
105 | 78 | ``` |
106 | 79 |
|
107 | | -<br> |
108 | | -and all find* methods return promises that resolve to an object like the following: |
109 | | -<br><br> |
| 80 | +and all find methods return promises that resolve to an object like the following: |
110 | 81 |
|
111 | | -``` |
| 82 | +```json |
112 | 83 | { |
113 | | - id: "http://vocab.getty.edu/ulan/500311165" |
114 | | - |
115 | | - name: "University of Pennsylvania, Lloyd P. Jones Gallery" |
116 | | - |
117 | | - nameType: "Corporate" |
118 | | - |
119 | | - originalQueryString: "jones" |
120 | | - |
121 | | - repository: "getty" |
122 | | - |
123 | | - uri: "http://vocab.getty.edu/ulan/500311165" |
124 | | - |
125 | | - uriForDisplay: "https://getty.lookup.services.cwrc.ca/ulan/500311165" |
126 | | - |
| 84 | + "id": "http://vocab.getty.edu/ulan/500311165", |
| 85 | + "name": "University of Pennsylvania, Lloyd P. Jones Gallery", |
| 86 | + "nameType": "Corporate", |
| 87 | + "originalQueryString": "jones", |
| 88 | + "repository": "getty", |
| 89 | + "uri": "http://vocab.getty.edu/ulan/500311165", |
| 90 | + "uriForDisplay": "https://getty.lookup.services.cwrc.ca/ulan/500311165" |
127 | 91 | } |
128 | 92 | ``` |
129 | | -<br><br> |
130 | | -There are a further four methods that are mainly made available to facilitate testing (to make it easier to mock calls to the getty service): |
131 | 93 |
|
132 | | -###### getPersonLookupURI(query) |
| 94 | +There are a further four methods that are mainly made available to facilitate testing (to make it easier to mock calls to the getty service): |
133 | 95 |
|
134 | | -###### getPlaceLookupURI(query) |
| 96 | +### getPersonLookupURI(query) |
135 | 97 |
|
| 98 | +### getPlaceLookupURI(query) |
136 | 99 |
|
137 | | -<br><br> |
138 | | -where the 'query' argument is the entity name to find and the methods return the getty URL that in turn returns results for the query. |
| 100 | +where the `query` argument is the entity name to find and the methods return the getty URL that in turn returns results for the query. |
139 | 101 |
|
140 | | -### Development |
| 102 | +## Development |
141 | 103 |
|
142 | | -[CWRC-Writer-Dev-Docs](https://github.com/jchartrand/CWRC-Writer-Dev-Docs) describes general development practices for CWRC-Writer GitHub repositories, including this one. |
| 104 | +[CWRC-Writer-Dev-Docs](https://github.com/cwrc/CWRC-Writer-Dev-Docs) describes general development practices for CWRC-Writer GitHub repositories, including this one. |
143 | 105 |
|
144 | | -#### Testing |
| 106 | +<!-- ### Testing |
145 | 107 |
|
146 | | -The code in this repository is intended to run in the browser, and so we use [browser-run](https://github.com/juliangruber/browser-run) to run [browserified](http://browserify.org) [tape](https://github.com/substack/tape) tests directly in the browser. |
| 108 | +The code in this repository is intended to run in the browser, and so we use [browser-run](https://github.com/juliangruber/browser-run) to run [browserified](http://browserify.org) [tape](https://github.com/substack/tape) tests directly in the browser. |
147 | 109 |
|
148 | | -We [decorate](https://en.wikipedia.org/wiki/Decorator_pattern) [tape](https://github.com/substack/tape) with [tape-promise](https://github.com/jprichardson/tape-promise) to allow testing with promises and async methods. |
| 110 | +We [decorate](https://en.wikipedia.org/wiki/Decorator_pattern) [tape](https://github.com/substack/tape) with [tape-promise](https://github.com/jprichardson/tape-promise) to allow testing with promises and async methods. --> |
149 | 111 |
|
150 | | -#### Mocking |
| 112 | +### Mocking |
151 | 113 |
|
152 | | -We use [fetch-mock](https://github.com/wheresrhys/fetch-mock) to mock http calls (which we make using the [Fetch API](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) rather than XMLHttpRequest). |
| 114 | +We use [fetch-mock](https://github.com/wheresrhys/fetch-mock) to mock http calls (which we make using the [Fetch API](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) rather than XMLHttpRequest). |
153 | 115 |
|
154 | | -We use [sinon](http://sinonjs.org) [fake timers](http://sinonjs.org/releases/v4.0.1/fake-timers/) to test our timeouts, without having to wait for the timeouts. |
| 116 | +<!-- We use [sinon](http://sinonjs.org) [fake timers](http://sinonjs.org/releases/v4.0.1/fake-timers/) to test our timeouts, without having to wait for the timeouts. |
155 | 117 |
|
156 | | -#### Code Coverage |
| 118 | +### Code Coverage |
157 | 119 |
|
158 | | -We generate code coverage by instrumenting our code with [istanbul](https://github.com/gotwarlost/istanbul) before [browser-run](https://github.com/juliangruber/browser-run) runs the tests, |
| 120 | +We generate code coverage by instrumenting our code with [istanbul](https://github.com/gotwarlost/istanbul) before [browser-run](https://github.com/juliangruber/browser-run) runs the tests, |
159 | 121 | then extract the coverage (which [istanbul](https://github.com/gotwarlost/istanbul) writes to the global object, i.e., the window in the browser), format it with [istanbul](https://github.com/gotwarlost/istanbul), and finally report (Travis actually does this for us) to [codecov.io](codecov.io) |
160 | 122 |
|
161 | | -#### Transpilation |
| 123 | +### Transpilation |
162 | 124 |
|
163 | | -We use [babelify](https://github.com/babel/babelify) and [babel-plugin-istanbul](https://github.com/istanbuljs/babel-plugin-istanbul) to compile our code, tests, and code coverage with [babel](https://github.com/babel/babel) |
| 125 | +We use [babelify](https://github.com/babel/babelify) and [babel-plugin-istanbul](https://github.com/istanbuljs/babel-plugin-istanbul) to compile our code, tests, and code coverage with [babel](https://github.com/babel/babel) --> |
164 | 126 |
|
165 | | -#### Continuous Integration |
| 127 | +### Continuous Integration |
166 | 128 |
|
167 | 129 | We use [Travis](https://travis-ci.org). |
168 | 130 |
|
169 | | -Note that to allow our tests to run in Electron on Travis, the following has been added to .travis.yml: |
170 | | - |
171 | | -``` |
172 | | -addons: |
173 | | - apt: |
174 | | - packages: |
175 | | - - xvfb |
176 | | -install: |
177 | | - - export DISPLAY=':99.0' |
178 | | - - Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 & |
179 | | - - npm install |
180 | | -``` |
181 | | - |
182 | | -#### Release |
| 131 | +### Release |
183 | 132 |
|
184 | 133 | We follow [SemVer](http://semver.org), which [Semantic Release](https://github.com/semantic-release/semantic-release) makes easy. |
185 | 134 | Semantic Release also writes our commit messages, sets the version number, publishes to NPM, and finally generates a changelog and a release (including a git tag) on GitHub. |
186 | | - |
0 commit comments