Skip to content

Add withStructuredMatching for linear-time array-consistent matching#245

Merged
fym-rgb merged 1 commit intoaws:mainfrom
fym-rgb:structured-finder-fix
Apr 6, 2026
Merged

Add withStructuredMatching for linear-time array-consistent matching#245
fym-rgb merged 1 commit intoaws:mainfrom
fym-rgb:structured-finder-fix

Conversation

@fym-rgb
Copy link
Copy Markdown
Collaborator

@fym-rgb fym-rgb commented Apr 6, 2026

Summary

Add a configuration flag withStructuredMatching(true) on Machine.Builder
that enables linear-time array-consistent matching via rulesForJSONEvent().

The default behavior is unchangedrulesForJSONEvent() uses the
original ACFinder with zero modifications. This enables safe shadow-mode
validation before switching over.

Problem

ACFinder exhibits O(N^2) step queue growth when events contain large arrays
at paths matching multi-field rules. Each array element match fans out to
all remaining fields via moveFrom(), producing N*(N+1)/2 steps.

Solution

StructuredFinder indexes the event by field path (HashMap), then walks the
compiled state machine trie with direct lookups instead of scanning all
fields. Early-exit stops the walk once all rules are matched.

  • O(N*K) for N array elements with K rule conditions (linear in N)
  • No changes to ACFinder.java or Event.java
  • Opt-in via Machine.builder().withStructuredMatching(true).build()

Usage

// Default: original ACFinder (unchanged behavior)
Machine machine = new Machine();

// Opt-in: linear-time StructuredFinder
Machine machine = Machine.builder()
    .withStructuredMatching(true)
    .build();

// Same API, same results, better performance on large arrays
List<String> matches = machine.rulesForJSONEvent(eventJson);

Files Changed

File Change
StructuredFinder.java New: indexed trie walker with early-exit
GenericMachine.java Config flag dispatch in rulesForJSONEvent
GenericMachineConfiguration.java useStructuredMatching flag
NameState.java getValueTransitionKeys()
SubRuleContext.java getRuleCount() for early-exit
ArrayMembership.java Package-visible size() and entries()
ACMachineTest.java createMachine() factory for subclass override
ACMachineStructuredTest.java Runs all ACMachineTest cases with structured matching
StructuredFinderTest.java 51 correctness cases + perf scenarios
README.md Document withStructuredMatching and performance tips

Performance

Tested with payloads up to 1.2MB. All times in ms, "OOM" = OutOfMemoryError:

Scenario N KB Original Structured
Object array, 2-field rule 45,000 1,121 OOM 74
3-level nested arrays 380 1,125 OOM 49
10-condition rule 8,000 1,177 OOM 57
Customer-like 6-field 130,000 1,161 OOM 29
Wildcard bomb attack 45,000 1,121 OOM 93
50-rule fan-out attack 45,000 1,121 OOM 74

Correctness

  • 748 in-repo tests pass (all existing + new structured tests)
  • 2,799 external test cases validated across all code paths
  • 14 attack payloads designed to stress implementation internals
  • Correctness comparison tool verifies all approaches produce identical results

@timbray
Copy link
Copy Markdown
Collaborator

timbray commented Apr 6, 2026

Is the long-term plan to continue exposing this as an option or make it the default behavior?

@fym-rgb fym-rgb force-pushed the structured-finder-fix branch from f22b78a to 88f5141 Compare April 6, 2026 19:15
@fym-rgb
Copy link
Copy Markdown
Collaborator Author

fym-rgb commented Apr 6, 2026

Is the long-term plan to continue exposing this as an option or make it the default behavior?

As an option for now so users dont have unintended behavioural changes and also are able to test in shadow mode. Will change the default flag to true once we are confident enough.

@fym-rgb fym-rgb enabled auto-merge April 6, 2026 19:26
…ching

Add a configuration flag withStructuredMatching(true) on Machine.Builder that
enables StructuredFinder, a linear-time matching algorithm for rulesForJSONEvent().

The default behavior is UNCHANGED — rulesForJSONEvent() uses the original
ACFinder with no modifications. Only when withStructuredMatching(true) is set
does it use the new StructuredFinder.

StructuredFinder indexes the event by field path and walks the compiled state
machine trie with direct HashMap lookups instead of scanning all remaining
fields per step. It also exits early once all rules are matched.

No changes to ACFinder.java or Event.java. The original matching path is
identical to origin/main, enabling safe shadow-mode validation.

Validated against 748 in-repo tests + 2799 external correctness cases.
Tested with 28 perf scenarios including 14 attack payloads up to 1.2MB.
@fym-rgb fym-rgb force-pushed the structured-finder-fix branch from 88f5141 to 7677eaf Compare April 6, 2026 20:34
@fym-rgb fym-rgb self-assigned this Apr 6, 2026
@fym-rgb fym-rgb merged commit 7677eaf into aws:main Apr 6, 2026
4 checks passed
@baldawarishi
Copy link
Copy Markdown

baldawarishi commented Apr 8, 2026

@fym-rgb in case you need it, there's probably a guide from me or @jonessha circa 2024 on how to run shadow modes or experimental rulers (along with other things).

@timbray
Copy link
Copy Markdown
Collaborator

timbray commented Apr 8, 2026

Haha, I remember when they built that, at the time we were introducing Ruler 2.0, and I was pestering Rishi or Shawn every day "Did you find anything?" After what seemed a long time they found a couple of things, all of them were number values matched numerically instead of literally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants