Skip to content

[Feature]: Fuzzy Keyword Search (RapidFuzz) to Enhance BM25 in HelixDB #781

@Pilsertech

Description

@Pilsertech

What do you want?

Add fuzzy keyword search support to HelixDB using libraries such as RapidFuzz (or similar), to complement and enhance the existing BM25-based keyword search.

This would allow HelixDB to handle:

Typos and misspellings

Partial matches

Slightly different word forms and variations

The fuzzy matching layer could be applied as a pre-filter or post-ranking step alongside BM25, improving recall while preserving the relevance scoring and performance benefits of the current search system.

Adding fuzzy search aligns strongly with HelixDB’s goal of “Ultra-Low Latency.”
By using highly optimized string-matching algorithms (as provided by RapidFuzz), HelixDB can improve search quality with minimal additional overhead, especially for short queries and user-generated input.

Feature Area

Other

Additional context

Implementation suggestion

Use RapidFuzz (Rust bindings or native implementation) for fast Levenshtein / token-based similarity scoring.

Combine fuzzy scores with BM25 scores for hybrid ranking.

Make fuzzy matching configurable (thresholds, enabled/disabled per query).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions