Skip to content

feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes#425

Open
Shazwazza wants to merge 14 commits intorelease/4.0from
feature/optional-taxonomy-directory
Open

feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes#425
Shazwazza wants to merge 14 commits intorelease/4.0from
feature/optional-taxonomy-directory

Conversation

@Shazwazza
Copy link
Owner

@Shazwazza Shazwazza commented Feb 25, 2026

Summary

This PR makes the taxonomy directory optional in Examine v4, ensures backward compatibility with Umbraco CMS and Umbraco.Cms.Search, and fixes three critical bugs discovered during compatibility testing.

Features

Optional Taxonomy Directory

Makes the taxonomy index an opt-in/opt-out feature, enabling lighter-weight index configurations when faceted taxonomy search is not needed.

  • Add UseTaxonomyIndex property to LuceneIndexOptions (default: true for backward compat)
  • Add IsTaxonomyEnabled property to LuceneIndex for runtime checks
  • Introduce ITaxonomyDirectoryFactory interface separated from IDirectoryFactory for cleaner abstraction
  • IDirectoryFactory.CreateTaxonomyDirectory now returns Directory? instead of Directory
  • Mark DirectoryFactoryBase as [Obsolete] — it exists only for compatibility and adds no value
  • FileSystemDirectoryFactory implements ITaxonomyDirectoryFactory with type-check at usage sites
  • SyncedFileSystemDirectoryFactory updated to handle optional taxonomy dir
  • Add LuceneNonTaxonomySearcher for efficient searching when taxonomy is disabled
  • NRT (near-real-time) reopen thread management updated for non-taxonomy path

Umbraco API Compatibility

  • Ensures compatibility with latest Umbraco CMS API surface area for Examine v4 consumers

Bug Fixes

1. FacetsConfig.Build not called for non-taxonomy indexes

Commit: 384550ee

When taxonomy was disabled, FacetsConfig.Build(doc) was not being called. This is required even for non-taxonomy indexes to process SortedSetDocValuesFacetField entries into proper SortedSetDocValuesField entries. Without it, documents with facet fields threw ArgumentException during indexing, causing silent failures where no items were indexed and IndexCommitted events never fired.

2. SearchableFields caching empty results from initially empty indexes (fixes #426)

Commit: 7e672a87

When a SearchContext was created during application startup before any documents were indexed, SearchableFields read from the empty index reader and cached an empty array. After documents were indexed and the NRT reader was refreshed, IsSearcherCurrent() returned true (the refreshed reader IS current), so the SearchContext was reused with its stale empty _searchableFields cache. This caused ManagedQuery/Search(string) to generate queries with no fields, returning zero results even though documents existed in the index.

Fix: Only cache SearchableFields when the result is non-empty. Applied to both SearchContext and TaxonomySearchContext.

3. NRT reader not refreshed before Committed event fires (fixes #427)

Commits: d606a41d, 6f040af3

In the async commit path (timer-based), the Committed event fired before WaitForChanges() completed, creating a race condition where consumers reacting to the Committed/IndexCommitted event could search with a stale NRT reader. Consumers would get zero or incomplete results even though the commit had completed.

Fix: Move WaitForChanges() into CommitNow() before the Committed event fires. Also removed the now-redundant WaitForChanges() calls in the synchronous !RunAsync paths.

Breaking Changes

Change Impact Migration
IDirectoryFactory.CreateTaxonomyDirectory returns Directory? Low — only affects custom IDirectoryFactory implementations Return null to disable taxonomy, or keep returning a directory

Tests

  • Examine v4: 888 tests pass (296 x 3 TFMs: net8.0, net9.0, net10.0), 0 failed
  • Umbraco.Cms.Search: 628 passed, 0 failed, 7 skipped
  • 4 new tests for SyncedFileSystemDirectoryFactory without taxonomy
  • Existing tests for optional taxonomy searcher behavior

Files Changed (22 files, +1741/-372)

Core Changes

  • src/Examine.Lucene/LuceneIndexOptions.csUseTaxonomyIndex option
  • src/Examine.Lucene/Providers/LuceneIndex.cs — Null taxonomy handling, non-taxonomy NRT, FacetsConfig.Build fix, redundant WaitForChanges cleanup
  • src/Examine.Lucene/Providers/IndexCommitter.cs — Race condition fix, null taxonomy writer
  • src/Examine.Lucene/Providers/LuceneNonTaxonomySearcher.cs — New non-taxonomy searcher
  • src/Examine.Lucene/Search/SearchContext.cs — Empty cache fix
  • src/Examine.Lucene/Search/TaxonomySearchContext.cs — Empty cache fix

Directory Infrastructure

  • src/Examine.Lucene/Directories/IDirectoryFactory.cs — Nullable return type
  • src/Examine.Lucene/Directories/ITaxonomyDirectoryFactory.cs — New interface
  • src/Examine.Lucene/Directories/DirectoryFactory.cs — Updated impl
  • src/Examine.Lucene/Directories/DirectoryFactoryBase.cs — Marked obsolete
  • src/Examine.Lucene/Directories/FileSystemDirectoryFactory.cs — ITaxonomyDirectoryFactory
  • src/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs — Optional taxonomy support

Public API

  • src/Examine.Lucene/PublicAPI.Unshipped.txt — New API surface entries

Related Issues

Backport

Bugs #426 and #427 have been backported to support/3.x in PR #428.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

- Add UseTaxonomyIndex property to LuceneIndexOptions (default: true for backwards compatibility)
- Update IDirectoryFactory.CreateTaxonomyDirectory to return nullable Directory
- Update DirectoryFactoryBase and FileSystemDirectoryFactory to check UseTaxonomyIndex option
- Update LuceneIndex to handle null TaxonomyWriter when taxonomy is disabled
- Add IsTaxonomyEnabled property to LuceneIndex for runtime checks
- Update IndexCommitter to handle null TaxonomyWriter
- SyncedFileSystemDirectoryFactory still requires taxonomy (throws if disabled)
- Update PublicAPI.Unshipped.txt with new API surface

BREAKING CHANGE: IDirectoryFactory.CreateTaxonomyDirectory now returns Directory? instead of Directory
- Document new UseTaxonomyIndex and IsTaxonomyEnabled properties
- Mark CreateTaxonomyDirectory methods as returning nullable Directory?
- Note that SyncedFileSystemDirectoryFactory requires taxonomy enabled
…y searcher

- Add LuceneNonTaxonomySearcher class to handle searches when taxonomy is disabled
- Update LuceneIndex.CreateSearcher() to return appropriate searcher based on UseTaxonomyIndex option
- Add _nrtReopenThreadNoTaxonomy for NRT support without taxonomy
- Update WaitForChanges() to use correct NRT thread
- Update Dispose() to clean up non-taxonomy NRT thread
- Add 4 new tests for SyncedFileSystemDirectoryFactory without taxonomy:
  - Given_NoTaxonomyDirectory_When_CreatingDirectory_Then_IndexCreatedSuccessfully
  - Given_NoTaxonomyDirectory_When_IndexingData_Then_SearchSucceeds
  - Given_CorruptMainIndex_And_HealthyLocalIndex_NoTaxonomy_When_CreatingDirectory_Then_LocalIndexSyncedToMain
  - Given_CorruptMainIndex_And_CorruptLocalIndex_NoTaxonomy_When_CreatingDirectory_Then_NewIndexesCreatedAndUsable
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Shazwazza and others added 6 commits February 25, 2026 09:05
When taxonomy is disabled, the non-taxonomy overload of FacetsConfig.Build(doc)
must still be called to process SortedSetDocValuesFacetField entries into proper
SortedSetDocValuesField entries. Without this, documents containing facet fields
throw ArgumentException during indexing, causing silent failures where no items
are indexed and IndexCommitted events never fire.

This restores the behavior from v4.0.0-beta.1 where FacetsConfig.Build(doc) was
always called regardless of taxonomy configuration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a SearchContext is created during application startup before any
documents have been indexed, SearchableFields reads from the empty index
reader and caches an empty array. After documents are indexed and the
NRT reader is refreshed, IsSearcherCurrent() returns true (the refreshed
reader IS current), so the SearchContext is reused with its stale empty
_searchableFields cache. This causes ManagedQuery/Search(string) to
generate queries with no fields, returning zero results even though
documents exist in the index.

Fix: only cache SearchableFields when the result is non-empty. An empty
index has nothing to search anyway, and re-reading on each call has
negligible cost. Once documents are indexed and fields exist, the
non-empty result is cached normally.

Applied to both SearchContext (non-taxonomy) and TaxonomySearchContext.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move WaitForChanges() into CommitNow() before the Committed event fires,
and remove the redundant call from TimerRelease(). Previously, in the
async commit path (timer-based), Committed fired before WaitForChanges()
completed, creating a race condition where consumers reacting to the
Committed/IndexCommitted event could search with a stale NRT reader that
hadn't yet been refreshed to include the just-committed changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CommitNow() now calls WaitForChanges() internally, so the explicit
calls after CommitNow() in the !RunAsync paths of
PerformIndexItemsInternal and PerformDeleteFromIndexInternal are
redundant no-ops. Remove them and update comments for clarity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Shazwazza Shazwazza changed the title Feature/optional taxonomy directory feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant