Local search rework proposal #725
Replies: 6 comments
-
|
Option 3 provides a critical architectural benefit: it decouples the query interface from the underlying data model. As the schema evolves (new fields, relationships, or database migrations), the business logic remains unchanged. The LLM acts as an adaptive translation layer that can:
This is fundamentally different from Options 1 & 2, which require code changes whenever new searchable fields are added or the data model evolves. Reduced Technical DebtOption 1 leads to flag proliferation. As the schema grows, you'd need:
Option 2 requires maintaining a query parser with its own grammar, lexer, and error handling—essentially building a mini query language. This introduces:
Option 3 externalizes this complexity to the LLM, which already handles natural language parsing at scale. Addressing Non-DeterminismAcceptable Variance in PracticeThe non-determinism concern is valid but overestimated in impact:
MCP Use Case AlignmentThe MCP (Model Context Protocol) use case is critical context. MCP is designed for AI agents to interact with tools. Option 3 is native to this paradigm:
Implementation Risk MitigationSecurity
Option 3 is the architecturally superior choice for a system targeting AI agent interactions. The non-determinism trade-off is acceptable. Key Benefits Summary
Recommendation: Implement Option 3 with query validation, caching, and optional structured fallback for deterministic requirements. |
Beta Was this translation helpful? Give feedback.
-
|
Option 2 looks better IMO than Option 1 as more complex query might not be able to work (depends on how cobra parses the flags). |
Beta Was this translation helpful? Give feedback.
-
|
Here is my weighted preference list, starting with the option I believe is strongest:
I'd do both Option 2 and 3. |
Beta Was this translation helpful? Give feedback.
-
|
I agree with @lgecse on the order of preference. I think supporting option 2 alone is enough. Once the query syntax is supported and documented, it is automatically supported by the MCP server to enable option 3 (meaning, we dont have to do anything/much to support option 3). Regarding @tkircsi comment on
is only partially true. We can use libraries such as https://github.com/alecthomas/participle to only define the query grammar, which can be natively mapped to whatever the backend db query builder. This simplifies the implementation greatly and reduces maintenance to only the grammar itself. I would suggest Option 2 with a caveat of integration with existing tools (query builders, etc) and avoid implementation on our end. This should be fairly straightforward iff we select the proper tooling to support it. |
Beta Was this translation helpful? Give feedback.
-
|
I can go with Option 2, but my personal opinion is Option 3 would be a better and more modern choice. The |
Beta Was this translation helpful? Give feedback.
-
|
We decided to not rework the current search functionality, only add specific fields to it: created_at, authors, schema_version, and module_id. Besides that, I will implement greater than, lesser than operators on some fields: version, schema_version and created_at. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Search Functionality Enhancement Proposal
Overview
This document proposes enhancements to the local search functionality in the Directory, addressing limitations in the current implementation and enabling more powerful, flexible queries for AI agents and CLI users.
Current State
The search functionality is currently accessible from two interfaces:
cli/cmd/search/search.gomcp/prompts/search_records.goCurrent Capabilities
*,?, and[abc]patternsCurrent Limitations
>0.1.0)description,created_at, ... field searchExample Current Usage
Problem Statement
AI agents need to perform more sophisticated searches to accurately discover relevant records. For example:
The current system cannot express these queries without returning false positives or requiring post-processing.
Proposed Solutions
Option 1: Query Expression Language with CLI Flags
Concept
Introduce boolean operators as CLI flags and add new field types with comparison operators. Use structured flags to build an expression tree on the backend.
I opened a PR to test how can we parse and Expression tree on the backend: #710
CLI Interface
CLI Implementation (
cli/cmd/search/):--and,--or,--not,--group-and,--group-or,--end-group, ...--version-gt,--version-gte,--version-lt,--version-lte, ...--description, ....QueryExpressionPros
Cons
--group-and...--end-groupcan be confusingOption 2: String-Based Query Language (Parsing Approach)
Concept
Define a simple query language syntax that can be expressed as a single string. Parse the string on the backend into an expression tree.
CLI Interface
Pros
Cons
Option 3: LLM-Based Natural Language Query
Concept
Accept free-text queries and use an LLM on the backend to translate them into structured database queries. The LLM has access to the database schema and generates the appropriate query.
CLI Interface
Pseudocode
Pros
Cons
Recommendation
I think that as a first step we should keep option 1 so that the feature set stays the same, and implement option 3. With option 3 we will have a solution that can extract the DB schema and create arbitrary queries based on what seems appropriate. We won't have to change the business logic as the DB changes.
Beta Was this translation helpful? Give feedback.
All reactions