Conversation
597b8fc to
64ad738
Compare
64ad738 to
9a920e7
Compare
|
👍 I can only go by the screenshot at the moment, but I wonder if the font size for the "Preview text..." toggle might be better if slightly smaller (0.75em)? It feels slightly too big of a jump from the metadata field labels to me, maybe giving it more importance than it deserves. (I'd also be fine with shipping this now and revisiting the styling details later.) |
|
Yeah, the sketch in projectblacklight/spotlight#1175 didn't give me much to go on, so I just threw the bootstrap Before we go much further, though, I think it'd be useful to discuss whether this feature is useful for Feigenbaum (or other exhibits) and/or how much more work we should do. |
|
@ndushay: do you know anything about the Solr Boundary Scanners? I'm pretty sure it won't help us here, given the quality of the OCR, but might it be adaptable to find page-level boundaries? |
|
Looks like Solr Boundary Scanners avail are break iterator (WORD, LINE, SENTENCE, and CHARACTER.) and simple, which seems to look for specific chars. Given OCR junk, having a particular character in our full text to indicate page breaks seems dicey. :-( |
9a920e7 to
d67c3ab
Compare
|
So far we, have concluded that existing collections with decent quality English full text are:
And that for the future, we should determine how to accession full text created by Abbyy digitization to facilitate indexing (or how to adjust indexing to take the full-text from page level and document level pdfs) |
|
See projectblacklight/spotlight#1227 for more info on files ABBYY creates and where the full text is, etc. |
A quick attempt at projectblacklight/spotlight#1175 for discussion.
See also: projectblacklight/spotlight#1227 and projectblacklight/spotlight#1228 for more background info.
(connects to projectblacklight/spotlight#1175)