Skip to content

fix: Guard backwards-array walker against non-InputByte sentinels#248

Merged
fym-rgb merged 1 commit intoaws:mainfrom
fym-rgb:release/v2.0.0
Apr 28, 2026
Merged

fix: Guard backwards-array walker against non-InputByte sentinels#248
fym-rgb merged 1 commit intoaws:mainfrom
fym-rgb:release/v2.0.0

Conversation

@fym-rgb
Copy link
Copy Markdown
Collaborator

@fym-rgb fym-rgb commented Apr 27, 2026

The extractNextJavaCharacterFromInputCharactersForBackwardsArrays walker and isContinuationByte assumed every element in the InputCharacter[] array is an InputByte. For wildcard patterns, the array contains InputWildcard sentinels (the leading/trailing '*'). When the walker iterated past the last continuation byte of a multi-byte UTF-8 character into a trailing InputWildcard, InputByte.cast() threw ClassCastException.

Trigger: suffix rule (hasSuffix > 0) + two wildcard rules with the same multi-byte value on the same JSON path. The second wildcard compile walks existing state with hasIndeterminatePrefix=true, reaching the backwards walker via canReuseNextByteState -> doMultipleTransitionsConvergeForInputByte.

Fix: return false from isContinuationByte for non-InputByte elements, and break from the backwards walker loop on non-InputByte elements. Both changes treat non-byte sentinels as multi-byte sequence boundaries.

Includes 8 regression tests covering: minimal repro, various multi-byte code points, many-rule interleaving, all add-order permutations, add-delete-add, stress (50 duplicate wildcards), and matching correctness for every scenario.

Issue #, if available:

Description of changes:

<Add details here. Mention why you are making this change and any related documents/links/artifacts. If you make any source code changes, also mention the testing and performance.>

Benchmark / Performance (for source code changes):

<replace this with output from /src/test/software/amazon/event/ruler/Bechmarks.java here.

The benchmark results can be fetched from "Pull request checks -> Java build -> build (ubuntu-X.Y, 8) -> Run benchmarks".>

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The extractNextJavaCharacterFromInputCharactersForBackwardsArrays walker
and isContinuationByte assumed every element in the InputCharacter[] array
is an InputByte. For wildcard patterns, the array contains InputWildcard
sentinels (the leading/trailing '*'). When the walker iterated past the
last continuation byte of a multi-byte UTF-8 character into a trailing
InputWildcard, InputByte.cast() threw ClassCastException.

Trigger: suffix rule (hasSuffix > 0) + two wildcard rules with the same
multi-byte value on the same JSON path. The second wildcard compile walks
existing state with hasIndeterminatePrefix=true, reaching the backwards
walker via canReuseNextByteState -> doMultipleTransitionsConvergeForInputByte.

Fix: return false from isContinuationByte for non-InputByte elements, and
break from the backwards walker loop on non-InputByte elements. Both
changes treat non-byte sentinels as multi-byte sequence boundaries.

Includes 8 regression tests covering: minimal repro, various multi-byte
code points, many-rule interleaving, all add-order permutations,
add-delete-add, stress (50 duplicate wildcards), and matching correctness
for every scenario.

sim: https://taskei.amazon.dev/tasks/P421369483
@fym-rgb fym-rgb self-assigned this Apr 27, 2026
@fym-rgb fym-rgb merged commit f2886f8 into aws:main Apr 28, 2026
4 checks passed
@fym-rgb fym-rgb deleted the release/v2.0.0 branch April 28, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant