Skip to content

[Enhancement] Fulltext index BloomFilter pushdown pre-filter#23833

Open
ck89119 wants to merge 5 commits intomatrixorigin:mainfrom
ck89119:fulltext-pushdown-23832
Open

[Enhancement] Fulltext index BloomFilter pushdown pre-filter#23833
ck89119 wants to merge 5 commits intomatrixorigin:mainfrom
ck89119:fulltext-pushdown-23832

Conversation

@ck89119
Copy link
Contributor

@ck89119 ck89119 commented Mar 10, 2026

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23832

What this PR does / why we need it:

Implement BloomFilter pushdown for fulltext index tables, similar to the existing IVF index pushdown.

When a fulltext search query has additional filter predicates (e.g. WHERE match(...) against(...) AND category = 'tech'), the filtered PKs are collected into a BloomFilter and pushed down to the fulltext index table reader to skip irrelevant rows early.

current applyJoinFullTextIndices's query plan:

Project
  -> Sort (score DESC)
    -> Join (src.pk = ft_alias.doc_id)
      -> Table Scan on src (with filter)
      -> Table Function fulltext_index_scan

When there is a filter, the doc_id returned by fulltext_index_scan can be massively selected for filters, resulting in wastes significant I/O.

optimized query plan(with filter)

Project
  -> Sort (score DESC, LIMIT)
    -> Join INNER (src.pk = ft_alias.doc_id)  [runtime filter: IN-list -> scanNode]
      -> Table Scan on src (with filter)
      -> Join INNER (ft_alias.doc_id = second.pk)  [runtime filter: BloomFilter -> ft_func]
        -> Table Function fulltext_index_scan
        -> Project (pk only)
          -> Table Scan on src (with filter, copy)

Key changes:

  1. plan (apply_indices_fulltext.go): applyJoinFullTextIndices builds RuntimeFilterBuildList with BloomFilter for the inner join between fulltext results and the source table.

  2. compile (scope.go, remoterun.go): Pass BloomFilter from context to FilterHint when the target table is a fulltext index table (TableType == "fulltext").

  3. reader (reader.go): tryUpdateColumns uses doc_id column for BF filtering on fulltext tables, independent of pkPos since internal SQL does not include __mo_fake_pk_col.

  4. pk_filter_mem (pk_filter_mem.go): NewMemPKFilter sets BFSeqNum using doc_id column for fulltext tables as fallback when __mo_index_pri_col is absent.

  5. build_ddl (build_ddl.go): Fix fulltext index table relkind from SystemIndexRel ("i") to FullTextIndex_TblType ("fulltext") so the TableType check matches at runtime.

Bug fixes included:

  • relkind mismatch: buildFullTextIndexTable wrote "i" to mo_tables.relkind instead of "fulltext", causing BF context propagation to be skipped.
  • pkPos gate: BF filtering logic was inside if pkPos != -1 block, but fulltext internal SQL never includes the PK column.
  • BFSeqNum not set: NewMemPKFilter only looked for __mo_index_pri_col to set BFSeqNum; fulltext tables use doc_id instead.

Tests:

  • Unit tests for NewMemPKFilter with fulltext table BFSeqNum
  • Unit tests for tryUpdateColumns with fulltext BF filtering
  • BVT: test/distributed/cases/fulltext/fulltext_pushdown.sql with 7 scenarios

…rigin#23832)

Implement BloomFilter pushdown for fulltext index tables, similar to
the existing IVF index pushdown. When a fulltext search query has
additional filter predicates (e.g. WHERE match(...) against(...) AND
category = 'tech'), the filtered PKs are collected into a BloomFilter
and pushed down to the fulltext index table reader to skip irrelevant
rows early.

Key changes:
- plan: applyJoinFullTextIndices builds RuntimeFilterBuildList with BF
- compile: scope.go/remoterun.go pass BF from context to FilterHint
- reader: tryUpdateColumns uses doc_id column for BF filtering on
  fulltext tables (independent of pkPos since internal SQL doesn't
  include __mo_fake_pk_col)
- pk_filter_mem: NewMemPKFilter sets BFSeqNum using doc_id column for
  fulltext tables (fallback when __mo_index_pri_col is absent)
- build_ddl: fix fulltext index table relkind from SystemIndexRel to
  FullTextIndex_TblType so TableType check matches at runtime

Approved by: @xxx
Ref matrixorigin#23832
@cpegeric
Copy link
Contributor

Feng design is to only use post-filtering and no bloomfilter. Keep callling Call() to get more rows until it reach the limit if there is/are filters in SQL.

Copy link
Contributor

@cpegeric cpegeric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

only use post-filtering if not enough LIMIT N, keep calling call() to get more rows and stop until enough rows returned after filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement kind/feature kind/test-ci size/L Denotes a PR that changes [500,999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants