Commit 53307cd
committed
Support transformation of LibGuide sub-pages
Why these changes are being introduced:
It was determined that we were not crawling LibGuides sub-pages in browsertrix. Once they started
rolling in to Transmogrifier for transform to TIMDEX records, it became clear we'd need to do a little
work to handle them.
How this addresses that need:
* Update the LibGuides API URL to include `?expand=pages`
* this adds a `.pages` node to the main/parent guides API data
* Interleave these sub-pages with the main guides in the API data, allowing the transform to find
and utilize them as well
* Because of increased crawl scope, filter out additional directory guides that have `g=176063` in
the URL
Side effects of this change:
* Transmogrifier can transform sub-pages crawled from libguides.mit.edu, resulting in an increased
TIMDEX record count for the `libguides` source
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-4491 parent 9cb7864 commit 53307cd
File tree
3 files changed
+95
-13
lines changed- tests/sources/json
- transmogrifier
- sources/json
3 files changed
+95
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
342 | 343 | | |
343 | 344 | | |
344 | 345 | | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
143 | | - | |
| 143 | + | |
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
87 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
88 | 95 | | |
89 | 96 | | |
90 | 97 | | |
91 | 98 | | |
92 | 99 | | |
93 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
94 | 111 | | |
95 | 112 | | |
96 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
97 | 117 | | |
98 | 118 | | |
99 | 119 | | |
100 | 120 | | |
101 | 121 | | |
| 122 | + | |
102 | 123 | | |
103 | 124 | | |
104 | 125 | | |
| |||
169 | 190 | | |
170 | 191 | | |
171 | 192 | | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
177 | 197 | | |
178 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
179 | 205 | | |
180 | 206 | | |
181 | 207 | | |
| |||
235 | 261 | | |
236 | 262 | | |
237 | 263 | | |
238 | | - | |
239 | | - | |
| 264 | + | |
| 265 | + | |
240 | 266 | | |
241 | 267 | | |
242 | 268 | | |
| |||
0 commit comments