Skip to content

feat: Add rulesForJSONEventNonBlocking with linear performance on large arrays#243

Closed
fym-rgb wants to merge 2 commits intoaws:mainfrom
fym-rgb:structured-finder-fix
Closed

feat: Add rulesForJSONEventNonBlocking with linear performance on large arrays#243
fym-rgb wants to merge 2 commits intoaws:mainfrom
fym-rgb:structured-finder-fix

Conversation

@fym-rgb
Copy link
Copy Markdown
Collaborator

@fym-rgb fym-rgb commented Apr 1, 2026

Issue #, if available:

Performance improvement for matching events with large arrays.

Description of changes:

Adds a new method rulesForJSONEventNonBlocking() on GenericMachine that provides array-consistent matching with guaranteed linear performance regardless of array size.

The existing rulesForJSONEvent() can exhibit O(N^2) performance when events contain large arrays at paths matching multi-field rules. For example, an event with a 10,000-element array matching a 2-field rule can take several seconds, while the new method completes in ~40ms.

What changed:

  • ACFinder.moveFrom(): Instead of iterating all remaining fields, uses a field name index to jump directly to relevant fields, and pre-checks array membership consistency before enqueuing steps.
  • Event.java: Builds a field name to index range map during construction (fields are already sorted by name, so same-name fields are contiguous).
  • NameState.java: Adds getValueTransitionKeys() to expose which field names a state transitions on.
  • StructuredFinder.java (new): For events with object arrays, walks the JSON tree structurally and matches per-element. Array consistency is correct by construction since each match call only sees one element's fields.
  • GenericMachine.java: Adds rulesForJSONEventNonBlocking() and deprecates rulesForJSONEvent().

Semantics: The new method is equivalent to rulesForJSONEvent() — same rules match same events. Validated against all 679 existing tests plus 51 new correctness cases.

Benchmark / Performance (for source code changes):

Scenario N KB Old (ms) New (ms)
prim+scalar 1,000 6.8 5 5
prim+scalar 100,000 868 95 115
obj-array both fields 5,000 115 664 22
obj-array both fields 20,000 486 TIMEOUT 77
nested 2-level 500 129 551 12
nested 2-level 2,000 541 TIMEOUT 50
customer-like 6-field 50,000 429 43 46

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

muktiranjan and others added 2 commits March 30, 2026 14:13
In moveFrom(), check nameState.getTransitionOn() before enqueuing an
ACStep. Fields with no value transition from the current NameState
would be immediately discarded by tryStep() anyway.
…ge arrays

Add a new matching method that provides array-consistent matching with
guaranteed linear performance regardless of array size.

The existing rulesForJSONEvent can exhibit O(N^2) performance when events
contain large arrays at paths matching multi-field rules. The new method
avoids this by:

1. Using a field name index in Event to jump directly to relevant fields
   in ACFinder.moveFrom(), instead of iterating all remaining fields
2. Pre-checking ArrayMembership consistency before enqueuing steps
3. For events with object arrays, using StructuredFinder which walks the
   JSON tree and matches per-element (AC correct by construction)

The new method is semantically equivalent to rulesForJSONEvent — same
rules match same events. Validated against 679 existing tests + 51 new
correctness cases covering flat, object-array, primitive-array, nested,
mixed, and edge cases.

Performance on events up to ~900KB:
- Primitive arrays: ~100ms (linear)
- Object arrays 20k elements: 77ms vs TIMEOUT on old method
- Nested 2-level 2k outer x 10 inner: 50ms vs TIMEOUT on old method
- Customer-like 6-field rule: ~46ms (linear)
@fym-rgb fym-rgb closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants