Skip to content

Stop words or 'do' in search result in no matches #5

@DigExpCon

Description

@DigExpCon

For KEYWORD searches, we are getting some odd results with AvantSearch. Including stop words in the search string seems to invalidate the rest of the search, and there also seems to be an issue with case sensitivity.

For example, if you search for this exact title as a keyword search: https://cpw.cvlcollections.org/items/show/210

The full title results in no hits:
https://cpw.cvlcollections.org/find?query=The+future+of+wildlife+conservation+funding%3A+What+options+do+U.S.+college+students+support%3F

A shortened version of the title results in no hits:
https://cpw.cvlcollections.org/find?query=The+future+of+wildlife+conservation+funding

A shortened version with a lower-case 'the' does work:
https://cpw.cvlcollections.org/find?query=the+future+of+wildlife+conservation+funding

A longer version that includes the word 'do' does not work (and 'do' is not in the list of InnoDB stop words):
https://cpw.cvlcollections.org/find?query=the+future+of+wildlife+conservation+funding%3A+what+options+do+U.S.+college+students+support%3F

A longer version without the word 'do' does work:
https://cpw.cvlcollections.org/find?query=the+future+of+wildlife+conservation+funding%3A+what+options+U.S.+college+students+support%3F

So, if the stop-words 'The' or 'What' or 'Of' are capitalized in the search string, the search finds no results. Or, if the word 'do' is in the search string (whether capitalized or not), no results are found.

The DB table search_texts is InnoDB, and its collation is utf8_unicode_ci (so should be case-insensitive).

The good news is that if you search for the full title, precisely as it appears in the record, as a Title search (in Advanced Search), the article comes up:
https://cpw.cvlcollections.org/find?advanced%5B0%5D%5Bjoiner%5D=and&advanced%5B0%5D%5Belement_id%5D=50&advanced%5B0%5D%5Btype%5D=contains&advanced%5B0%5D%5Bterms%5D=The+future+of+wildlife+conservation+funding%3A+What+options+do+U.S.+college+students+support%3F&layout=1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions