GroupedOr/GroupedAnd Ignore Field-Specific Analyzers and Lowercase Raw Field Terms

It took me some time to figure out where the problem came from and how to fix it. I let AI generate the detailed explanation below for more context, but in a nutshell:

I have an Umbraco 16 project with Examine 3.7.1.
I have an index with analysis records. These records are displayed in a grid in the backoffice of Umbraco. The data comes from a Lucene index. The display of those records in the grid always work without issues.

But there are also filters on the dashboard that can filter those records and those filters don't reliably work. We traced it to the generation of the lucene query:

When the database is new (no TEMP folder on boot) or when you just rebuild the index, the filters work perfectly fine. We can see in the query that the status has the correct casing:
`Generated Lucene Query: { Category: , LuceneQuery: *:* +(AnalysisStatus:*UpToDate* ) }`

However, when you shut down Umbraco and start Umbraco again, the filters don't work anymore. We see in the query that the analysis status is now suddenly lower case.
`Generated Lucene Query: { Category: , LuceneQuery: *:* +(AnalysisStatus:*uptodate*) }`

This continues to not-work until you rebuild the index in the backoffice again. Now I let my AI agent explain what the issue is:

---------------------------------
AI explanation
---------------------------------
## Summary
`GroupedOr()` and `GroupedAnd()` methods ignore field-specific analyzer configurations (e.g., `FieldDefinitionTypes.Raw` with `KeywordAnalyzer`) and instead use the default analyzer with `LowercaseExpandedTerms=true`, causing case-sensitive fields to be queried with lowercased terms.

## Environment
- **Examine Version**: 3.x (support/3.x branch)
- **Lucene.NET Version**: 4.8.0-beta00016
- **Target Framework**: .NET 6 / .NET 8
- **Context**: Umbraco CMS project with custom Examine indexes for content search

## Steps to Reproduce

### 1. Configure an Umbraco index with a Raw field
- Create an `IConfigureNamedOptions<LuceneDirectoryIndexOptions>` implementation
- Configure the index with `CultureInvariantWhitespaceAnalyzer` as the default analyzer
- Add a field definition: `new FieldDefinition("AnalysisStatus", FieldDefinitionTypes.Raw)`
- Register the configuration in the DI container

### 2. Index Umbraco content with case-sensitive values
- Create content items with an "AnalysisStatus" property
- Set values with specific casing: "UpToDate" and "OutOfDate"
- Trigger indexing (either through Umbraco backoffice rebuild or programmatically)

### 3. Query using GroupedOr
- Create a query using the Examine searcher: `searcher.CreateQuery("content")`
- Use `GroupedOr()` with the field and expected values: `GroupedOr(new[] { "AnalysisStatus" }, new[] { "UpToDate", "OutOfDate" })`
- Execute the query and observe results

### 4. Observe the inconsistent behavior
- **First scenario**: Delete the entire Umbraco temp folder (including index files) and restart the application
- **Second scenario**: Restart the application without deleting the temp folder

## Expected Behavior
- Query should preserve casing: `+(AnalysisStatus:UpToDate AnalysisStatus:OutOfDate)`
- Should return all matching documents (2 results in the example)
- Should work consistently regardless of whether index is newly created or reopened from disk

## Actual Behavior
- Query terms are lowercased: `+(AnalysisStatus:uptodate AnalysisStatus:outofdate)`
- Returns 0 results because indexed values are "UpToDate" and "OutOfDate" (case-sensitive)
- **Non-deterministic behavior**:
  - Sometimes works after deleting temp folder and creating fresh index
  - Consistently fails on subsequent application restarts when reopening existing index from disk

## Root Cause Analysis

### Problem 1: Query Parser Uses Default Analyzer
- **File**: `Examine.Lucene\Search\LuceneSearchQuery.cs` (~Line 30, `CreateQueryParser` method)
- The query parser is initialized with the default analyzer (e.g., `CultureInvariantWhitespaceAnalyzer`)
- It does NOT use the `PerFieldAnalyzerWrapper` that contains field-specific analyzers (like `KeywordAnalyzer` for Raw fields)
- The `LowercaseExpandedTerms` property only gets set if `LuceneSearchOptions` is explicitly provided
- Without explicit options, it defaults to Lucene.NET's default: `true`

### Problem 2: Grouped Methods Force Query Parser Usage
- **File**: `Examine.Lucene\Search\LuceneSearchQueryBase.cs` (~Line 485, `GetMultiFieldQuery` method)
- `GroupedOr()` and `GroupedAnd()` call `GetMultiFieldQuery()` internally
- This method hardcodes `useQueryParser: true` when calling `GetFieldInternalQuery()`
- This forces all grouped operations through the query parser path

### Problem 3: Query Parser Applies Default Analyzer Lowercasing
- **File**: `Examine.Lucene\Search\LuceneSearchQueryBase.cs` (~Line 330, `GetFieldInternalQuery` method)
- When `useQueryParser=true` and `Examineness.Explicit`, the code calls `_queryParser.GetFieldQueryInternal()`
- This uses the default analyzer (not the field-specific `KeywordAnalyzer`)
- Lucene's query parser applies lowercasing when `LowercaseExpandedTerms=true`
- Result: Raw field terms are lowercased despite field configuration

## Current Workarounds

### Option 1: Explicitly disable lowercasing via LuceneSearchOptions
- Create `LuceneSearchOptions` with `LowercaseExpandedTerms = false`
- Pass it when creating the query
- Apply `GroupedOr()` as normal

### Option 2: Use Escape() extension method
- Apply `.Escape()` to each search term value
- This bypasses the query parser and creates a `PhraseQuery` directly
- Example: `new[] { "UpToDate".Escape(), "OutOfDate".Escape() }`

### Option 3: Chain individual Field() calls instead of GroupedOr
- Use individual `Field()` calls with `.Escape()` 
- Connect them with `.Or()` operator
- More verbose but reliable for case-sensitive fields

## Additional Context

### Non-Deterministic Behavior
The inconsistency between fresh index creation and reopening suggests:
- Potential state initialization issues in query parser or analyzer caching
- Thread safety concerns (code comments indicate "Query parsers are not thread safe")
- Different initialization paths when creating vs. opening existing Lucene directory

### Impact on Umbraco Projects
- Affects any Umbraco site using Examine with case-sensitive fields
- Common scenarios: status enums, category codes, custom identifiers
- Particularly problematic for production environments where index rebuilds are infrequent
- Developers may not notice the issue during development (fresh index) but encounter it in production (persistent index)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GroupedOr/GroupedAnd Ignore Field-Specific Analyzers and Lowercase Raw Field Terms #419

AI explanation

Summary

Environment

Steps to Reproduce

1. Configure an Umbraco index with a Raw field

2. Index Umbraco content with case-sensitive values

3. Query using GroupedOr

4. Observe the inconsistent behavior

Expected Behavior

Actual Behavior

Root Cause Analysis

Problem 1: Query Parser Uses Default Analyzer

Problem 2: Grouped Methods Force Query Parser Usage

Problem 3: Query Parser Applies Default Analyzer Lowercasing

Current Workarounds

Option 1: Explicitly disable lowercasing via LuceneSearchOptions

Option 2: Use Escape() extension method

Option 3: Chain individual Field() calls instead of GroupedOr

Additional Context

Non-Deterministic Behavior

Impact on Umbraco Projects

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

GroupedOr/GroupedAnd Ignore Field-Specific Analyzers and Lowercase Raw Field Terms #419

Description

AI explanation

Summary

Environment

Steps to Reproduce

1. Configure an Umbraco index with a Raw field

2. Index Umbraco content with case-sensitive values

3. Query using GroupedOr

4. Observe the inconsistent behavior

Expected Behavior

Actual Behavior

Root Cause Analysis

Problem 1: Query Parser Uses Default Analyzer

Problem 2: Grouped Methods Force Query Parser Usage

Problem 3: Query Parser Applies Default Analyzer Lowercasing

Current Workarounds

Option 1: Explicitly disable lowercasing via LuceneSearchOptions

Option 2: Use Escape() extension method

Option 3: Chain individual Field() calls instead of GroupedOr

Additional Context

Non-Deterministic Behavior

Impact on Umbraco Projects

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions