Skip to content

Default metadata field names in PagePdfDocumentReader are can't be parsed in a filter expression #696

@markpollack

Description

@markpollack

The field name file_name is not compatible with the filter expression parsing.

		SearchRequest searchRequest = SearchRequest.defaults()
				.withTopK(4)
				.withFilterExpression(PagePdfDocumentReader.METADATA_FILE_NAME + " == 'medicaid-wa-faqs.pdf'");

where `public static final String METADATA_FILE_NAME = "file_name"

throws the exception

Caused by: org.antlr.v4.runtime.NoViableAltException: null
	at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2014) ~[antlr4-runtime-4.13.1.jar:4.13.1]
	at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:445) ~[antlr4-runtime-4.13.1.jar:4.13.1]
	at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:371) ~[antlr4-runtime-4.13.1.jar:4.13.1]
	at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.booleanExpression(FiltersParser.java:556) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
	at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.where(FiltersParser.java:199) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
	at org.springframework.ai.vectorstore.filter.FilterExpressionTextParser.parse(FilterExpressionTextParser.java:147) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
	... 46 common frames omitted

Underscore seems to be the issue. Suggest we change to use camel case for document readers that add metadata fields.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions