-
Notifications
You must be signed in to change notification settings - Fork 838
Change range in repeat to track counters per iteration #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.8-dev
Are you sure you want to change the base?
Conversation
This changeset aligns the semantics of traversals with range() and limit() used in repeat() with the behaviour of the equivalent 'unrolled' traversal without repeat(). This is done by tracking per-iteration counters instead of a global counter. The semantics of range() and limit() without repeat() and with local() were not modified. Single and nested loop Traverser implementations were also refactored to reduce much code duplication by consolidating common loop logic into a new NL_SL_Traverser base interface for loop-aware traversers.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 3.8-dev #3241 +/- ##
==========================================
Coverage ? 75.71%
==========================================
Files ? 28
Lines ? 6390
Branches ? 0
==========================================
Hits ? 4838
Misses ? 1331
Partials ? 221 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/test/features/filter/Range.feature
Outdated
Show resolved
Hide resolved
gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/test/features/filter/Range.feature
Show resolved
Hide resolved
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3192[TINKERPOP-3192] | ||
==== Modified limit() and range() Semantics in repeat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we also want to highlight this in the Provider section, but we should add an entry into the gremlin-semantics
doc on this.
gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/test/features/filter/Range.feature
Show resolved
Hide resolved
...rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/RangeGlobalStep.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/tinkerpop/gremlin/process/traversal/traverser/NL_SL_Traverser.java
Show resolved
Hide resolved
final AtomicLong counter = counters.computeIfAbsent(counterKey, k -> new AtomicLong(0L)); | ||
|
||
if (this.high != -1 && counter.get() >= this.high) { | ||
if (this.getTraversal().getParent() instanceof RepeatStep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should probably apply recursively to grandparents etc. Otherwise nesting a RangeGlobalStep inside a global parent leads to failure to "reset" on each iteration:
gremlin> g.V().repeat(limit(1)).times(2)
==>v[1]
gremlin> g.V().repeat(union(limit(1))).times(2)
gremlin>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if we can get away with dropping this if statement altogether and always returning false instead of throwing a FastNoSuchElementException regardless of if this step is nested in a repeat. A quick test shows that with the exception of g_V_playlist_paths
in Paths.feature (which seems to hang indefinitely), all of the remaining feature tests are unaffected by such a change. I'm not quite sure why that one scenario is failing, but that may be a path to a preferable solution if it can be resolved.
It's notable that most filter steps (especially TailGlobalStep) don't throw NoSuchElementExceptions and instead solely rely on returning false from filter()
to indicate a result should be filtered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the check to move upwards until the root and added feature test for limit in union in repeat. I prefer to leave the check in to reduce potential backwards compatibility breakage since we know something is going wrong with the Paths.feature test when it is removed.
...rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/RangeGlobalStep.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/RangeGlobalStep.java
Outdated
Show resolved
Hide resolved
import static org.apache.tinkerpop.gremlin.process.traversal.traverser.TraverserRequirement.SINGLE_LOOP; | ||
|
||
/** | ||
* Intermediate helper interface for {@link Traverser}s that support single OR nested loops (but not both). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this sentence (may be due to my limited understanding of loops tracking in Traversers). I'm not understanding when a Traverser would support nested loops but not single loops. Just looking at the names of the traversers, all of the NL traversers also contain SL. Additionally, the way RepeatStep adds requirements, any Traversal with NL requirements will always contain SL requirements as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's b/c even though NL traversers contain SL, implementation wise they extend but overrode the SL logic, so essentially it is one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's b/c even though NL traversers contain SL, implementation wise they extend but overrode the SL logic, so essentially it is one or the other?
This is exactly why. The NL traversers all extend SL traversers but override all loops
related methods to reference the nested loop stack and the single loop counter is essentially unused.
Yes all NL traversers do support SL but do so using the NL stack data structure, ignoring the SL loop counter.
I've modified the javadoc to be hopefully less confusing.
.../test/java/org/apache/tinkerpop/gremlin/process/traversal/util/TraversalExplanationTest.java
Outdated
Show resolved
Hide resolved
gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/test/features/filter/Range.feature
Outdated
Show resolved
Hide resolved
assertEquals(repeatTraversal.size(), withComputerTraversal.size()); | ||
} | ||
|
||
private static void load(GraphTraversalSource g) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there would be value in giving this graph a name and adding it to TinkerFactory such that it is reusable elsewhere, or is it too specific for the exact behaviour you are trying to test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is specific to this test as I wanted a very predictable structured data to traverse through. For feature tests and local testing the grateful dead graph can be used instead.
/** | ||
* If this range step is used inside a loop there can be multiple counters, otherwise there should only be one | ||
*/ | ||
private Map<String, AtomicLong> counters = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there potentially be performance impacts with deeply nested and complex repeat queries? I don't have any example to offer, but just wondering for awareness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general yes nested repeats can be less performant than the unrolled equivalent but that was the case even prior to this change. I believe that is the reason for creating the RepeatUnrollStrategy which will optimize by unrolling simple repeat traversals if possible.
…lStrategy) as range is no longer applicable for that strategy. Added a few more feature tests to cover different scenarios.
…ugging flaky test.
https://issues.apache.org/jira/browse/TINKERPOP-3202
This changeset aligns the semantics of traversals with
range()
andlimit()
used inrepeat()
with the behaviour of the equivalent 'unrolled' traversal withoutrepeat()
. This is done by tracking per-iteration counters instead of a global counter. The semantics ofrange()
andlimit()
withoutrepeat()
and withlocal()
were not modified.Single and nested loop
Traverser
implementations were also refactored to reduce much code duplication by consolidating common loop logic into a newNL_SL_Traverser
base interface for loop-aware traversers.VOTE +1