-
Notifications
You must be signed in to change notification settings - Fork 6
Description
For KEYWORD searches, we are getting some odd results with AvantSearch. Including stop words in the search string seems to invalidate the rest of the search, and there also seems to be an issue with case sensitivity.
For example, if you search for this exact title as a keyword search: https://cpw.cvlcollections.org/items/show/210
The full title results in no hits:
https://cpw.cvlcollections.org/find?query=The+future+of+wildlife+conservation+funding%3A+What+options+do+U.S.+college+students+support%3F
A shortened version of the title results in no hits:
https://cpw.cvlcollections.org/find?query=The+future+of+wildlife+conservation+funding
A shortened version with a lower-case 'the' does work:
https://cpw.cvlcollections.org/find?query=the+future+of+wildlife+conservation+funding
A longer version that includes the word 'do' does not work (and 'do' is not in the list of InnoDB stop words):
https://cpw.cvlcollections.org/find?query=the+future+of+wildlife+conservation+funding%3A+what+options+do+U.S.+college+students+support%3F
A longer version without the word 'do' does work:
https://cpw.cvlcollections.org/find?query=the+future+of+wildlife+conservation+funding%3A+what+options+U.S.+college+students+support%3F
So, if the stop-words 'The' or 'What' or 'Of' are capitalized in the search string, the search finds no results. Or, if the word 'do' is in the search string (whether capitalized or not), no results are found.
The DB table search_texts is InnoDB, and its collation is utf8_unicode_ci (so should be case-insensitive).
The good news is that if you search for the full title, precisely as it appears in the record, as a Title search (in Advanced Search), the article comes up:
https://cpw.cvlcollections.org/find?advanced%5B0%5D%5Bjoiner%5D=and&advanced%5B0%5D%5Belement_id%5D=50&advanced%5B0%5D%5Btype%5D=contains&advanced%5B0%5D%5Bterms%5D=The+future+of+wildlife+conservation+funding%3A+What+options+do+U.S.+college+students+support%3F&layout=1