Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ This release also includes changes from <<release-3-7-XXX, 3.7.XXX>>.
* Added a Gremln MCP server.
* Added the Air Routes dataset to the set of available samples packaged with distributions.
* Added a minimal distribution for `tinkergraph-gremlin` using the `min` classifier that doesn't include the sample datasets.
* Removed `AggregateLocalStep` and `aggregate(Scope, String)`, and renamed `AggregateGlobalStep` to `AggregateStep`.
* Removed `store()` in favor of using `local(aggregate())`.
* Removed Vertex/ReferenceVertex from grammar. Use vertex id in traversals now instead.
* Removed `has(key, traversal)` option for `has()` step.
* Fixed bug where `InlineFilterStrategy` could add an empty `has()`.
Expand Down
7 changes: 2 additions & 5 deletions docs/src/dev/provider/gremlin-semantics.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -616,7 +616,7 @@ link:https://tinkerpop.apache.org/docs/x.y.z/reference/#all-step[reference]

*Description:* Collects all objects in the traversal into a collection.

*Syntax:* `aggregate(String sideEffectKey)` | `aggregate(Scope scope, String sideEffectKey)`
*Syntax:* `aggregate(String sideEffectKey)`

[width="100%",options="header"]
|=========================================================
Expand All @@ -626,8 +626,6 @@ link:https://tinkerpop.apache.org/docs/x.y.z/reference/#all-step[reference]

*Arguments:*

* `scope` - Determines the scope in which `aggregate` is applied. The `global` scope will collect all objects across the
traversal. The `local` scope will collect objects within the current object (if it's a collection).
* `sideEffectKey` - The name of the side-effect key that will hold the aggregated objects.

*Modulation:*
Expand All @@ -643,8 +641,7 @@ The aggregated objects can be accessed later using the `cap()` step.

None

See: link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/AggregateGlobalStep.java[source],
link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/AggregateLocalStep.java[source (local)],
See: link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/AggregateStep.java[source],
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#aggregate-step[reference]

[[and-step]]
Expand Down
2 changes: 1 addition & 1 deletion docs/src/recipes/appendix.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ The following example assumes that the edges point in the `OUT` direction. Assum
----
g.V().where(without("x")).as("a").
outE().as("e").inV().as("b").
filter(bothE().where(neq("e")).otherV().where(eq("a"))).aggregate(local, "x").
filter(bothE().where(neq("e")).otherV().where(eq("a"))).local(aggregate("x")).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i've asked this elsewhere, but again, have we tested these examples manually to be sure our results are the same just by adding the local wrapping?

select("a","b").dedup()
----
Expand Down
4 changes: 2 additions & 2 deletions docs/src/recipes/centrality.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ g.V().as("v").
select("triples").unfold().as("t").
select("x","y").where(eq("a")).
select("t"),
aggregate(local,"triples")). <5>
local(aggregate("triples"))). <5>
select("z").as("length").
select("triple").select("z").where(eq("length"))). <6>
select(all, "v").unfold(). <7>
Expand Down Expand Up @@ -124,7 +124,7 @@ g.withSack(1f).V().as("v").
select("triples").unfold().as("t").
select("x","y").where(eq("a")).
select("t"),
aggregate(local,"triples")). <5>
local(aggregate("triples"))). <5>
select("z").as("length").
select("triple").select("z").where(eq("length"))). <6>
group().by(select(first, "v")). <7>
Expand Down
32 changes: 16 additions & 16 deletions docs/src/recipes/collections.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ appear by way of some side-effect steps like `aggregate()`:
[gremlin-groovy,modern]
----
g.V().fold()
g.V().aggregate(local, 'a').cap('a')
g.V().local(aggregate('a')).cap('a')
----
It is worth noting that while a `Path` is not technically a `List` it does present like one and can be manipulated in
Expand All @@ -61,7 +61,7 @@ It may seem simple, but the most obvious choice to modifying what is in a list i
[gremlin-groovy,modern]
----
g.V().fold().unfold().values('name')
g.V().aggregate(local,'a').cap('a').unfold().values('name')
g.V().local(aggregate('a')).cap('a').unfold().values('name')
----
The above examples show that `unfold()` works quite well when you don't want to preserve the `List` structure of the
Expand Down Expand Up @@ -164,19 +164,19 @@ the use of `aggregate()` to aid in construction of this `List`:
[gremlin-groovy,modern]
----
g.V().has('name','marko').as('v'). <1>
aggregate(local,'a'). <2>
by('age').
local(aggregate('a'). <2>
by('age')).
repeat(outE().as('e').inV().as('v')). <3>
until(has('lang','java')).
aggregate('b'). <4>
by(select(all,'v').unfold().values('name').fold()).
aggregate('c'). <5>
by(select(all,'e').unfold().values('weight').mean()).
fold(). <6>
aggregate(local,'a'). <7>
by(cap('b')).
aggregate(local,'a'). <8>
by(cap('c')).
local(aggregate('a'). <7>
by(cap('b'))).
local(aggregate('a'). <8>
by(cap('c'))).
cap('a')
----
Expand All @@ -190,9 +190,9 @@ of "java"). Note however that the `by()` modulator overrides that traverser comp
the list of vertices in "v". Those vertices are unfolded to retrieve the name property from each and then are reduced
with `fold()` back into a list to be stored in the side-effected named "b".
<5> A similar use of `aggregate()` as the previous step, though this one turns "e" into a stream of edges to calculate
the `mean()` to store in a `List` called "c". Note that `aggregate()` (short form for `aggregate(global)`) was used
here instead of `aggregate(local)`, as the former is an eager collection of the elements in the stream
(`aggregate(local)` is lazy) and will force the traversal to be iterated up to that point before moving forward.
the `mean()` to store in a `List` called "c". Note that `aggregate()` with `local()` was used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think some of this content is in the wrong place.

we deprecated in 3.7.x, so we want folks to start using the new form there. if we only deprecate and don't present the new pattern along with it, how will they understand the new form as they face deprecation? i think the 3.7.x docs should reflect the changes in the reference docs, recipes, etc. in that version, UNLESS, we've changed something in the semantics that only allow these changes in 3.8.0 (in which case there really isn't a deprecation path and maybe the #3234 doesn't make sense? thoughts?

here instead of `aggregate()`, as the latter is an eager collection of the elements in the stream
(`local(aggregate())` is lazy) and will force the traversal to be iterated up to that point before moving forward.
Without that eager collection, "v" and "e" would not contain the complete information required for the production of
"b" and "c".
<6> Adding `fold()`-step here is a bit of a trick. To see the trick, copy and paste all lines of Gremlin up to but
Expand All @@ -203,11 +203,11 @@ when traversing away from "marko"). The `aggregate()`-steps are side-effects and
through them unchanged. The `fold()` obviously converts those three traversers to a single `List` to make one
traverser with a `List` inside. That means that the remaining steps following the `fold()` will only be executed one
time each instead of three, which, as will be shown, is critical to the proper result.
<7> The single traverser with the `List` of three vertices in it passes to `aggregate(local)`. The `by()` modulator
<7> The single traverser with the `List` of three vertices in it passes to `local(aggregate())`. The `by()` modulator
presents an override telling Gremlin to ignore the `List` of three vertices and simply grab the "b" side effect created
earlier and stick that into "a" as part of the result. The `List` with three vertices passes out unchanged as
`aggregate(local)` is a side-effect step.
<8> Again, the single traverser with the `List` of three vertices passes to `aggregate(local)` and again, the `by()`
`local(aggregate())` is a side-effect step.
<8> Again, the single traverser with the `List` of three vertices passes to `local(aggregate())` and again, the `by()`
modulator presents an override to include "c" into the result.
All of the above code and explanation show that `aggregate()` can be used to construct `List` objects as side-effects
Expand Down Expand Up @@ -236,11 +236,11 @@ g.V().
bothE().count()).
fold())
g.V().
aggregate(local, 'x').
local(aggregate('x').
by(union(select('x').count(local), <2>
identity(),
bothE().count()).
fold()).
fold())).
cap('x')
----
Expand Down
2 changes: 1 addition & 1 deletion docs/src/recipes/connected-components.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ A straightforward way to detect the various subgraphs with an OLTP traversal is
[gremlin-groovy,existing]
----
g.V().emit(cyclicPath().or().not(both())). <1>
repeat(__.where(without('a')).aggregate(local,'a').both()).until(cyclicPath()). <2>
repeat(__.where(without('a')).local(aggregate('a')).both()).until(cyclicPath()). <2>
group().by(path().unfold().limit(1)). <3>
by(path().unfold().dedup().fold()). <4>
select(values).unfold() <5>
Expand Down
2 changes: 1 addition & 1 deletion docs/src/recipes/shortest-path.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ g.withSideEffect('v', []). <1>
inject(result.toArray()).as('kv').select(values).
unfold().
map(unfold().as('v_or_e').
coalesce(V().where(eq('v_or_e')).aggregate(local,'v'),
coalesce(V().where(eq('v_or_e')).local(aggregate('v')),
select('v').tail(local, 1).unfold().bothE().where(eq('v_or_e'))).
values('name','weight').
fold()).
Expand Down
38 changes: 13 additions & 25 deletions docs/src/reference/the-traversal.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -454,8 +454,8 @@ image:side-effect-lambda.png[width=175,float=right]
[gremlin-groovy,modern]
----
g.V().hasLabel('person').sideEffect(System.out.&println) <1>
g.V().sideEffect(outE().count().aggregate(local,"o")).
sideEffect(inE().count().aggregate(local,"i")).cap("o","i") <2>
g.V().sideEffect(outE().count().local(aggregate("o"))).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It might be nice to pull all of these docs changes (aside from the semantics docs I suppose) back into the 3.7 deprecation PR. The existing examples in 3.7-dev will still run, however it is preferable to point users towards the preferred local(aggregate()) usage instead of the deprecated aggregate(local)

sideEffect(inE().count().local(aggregate("i"))).cap("o","i") <2>
----

<1> Whatever enters `sideEffect()` is passed to the next step, but some intervening process can occur.
Expand Down Expand Up @@ -596,26 +596,22 @@ link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gre
image::aggregate-step.png[width=800]

The `aggregate()`-step (*sideEffect*) is used to aggregate all the objects at a particular point of traversal into a
`Collection`. The step is uses `Scope` to help determine the aggregating behavior. For `global` scope this means that
the step will use link:http://en.wikipedia.org/wiki/Eager_evaluation[eager evaluation] in that no objects continue on
until all previous objects have been fully aggregated. The eager evaluation model is crucial in situations
where everything at a particular point is required for future computation. By default, when the overload of
`aggregate()` is called without a `Scope`, the default is `global`. An example is provided below.
`Collection`. By default, the step will use link:http://en.wikipedia.org/wiki/Eager_evaluation[eager evaluation] in that
no objects continue on until all previous objects have been fully aggregated. The eager evaluation model is crucial in situations
where everything at a particular point is required for future computation.

[gremlin-groovy,modern]
----
g.V(1).out('created') <1>
g.V(1).out('created').aggregate('x') <2>
g.V(1).out('created').aggregate(global, 'x') <3>
g.V(1).out('created').aggregate('x').in('created') <4>
g.V(1).out('created').aggregate('x').in('created').out('created') <5>
g.V(1).out('created').aggregate('x').in('created') <3>
g.V(1).out('created').aggregate('x').in('created').out('created') <4>
g.V(1).out('created').aggregate('x').in('created').out('created').
where(without('x')).values('name') <6>
where(without('x')).values('name') <5>
----

<1> What has marko created?
<2> Aggregate all his creations.
<3> Identical to the previous line.
<3> Who are marko's collaborators?
<4> What have marko's collaborators created?
<5> What have marko's collaborators created that he hasn't created?
Expand All @@ -635,31 +631,23 @@ g.V().out('knows').aggregate('x').by('age').cap('x') <1>

<1> The "age" property is not <<by-step,productive>> for all vertices and therefore those values are not included in the aggregation.

For `local` scope the aggregation will occur in a link:http://en.wikipedia.org/wiki/Lazy_evaluation[lazy] fashion.

NOTE: Prior to 3.4.3, `local` aggregation (i.e. lazy) evaluation was handled by `store()`-step.
Aggregation can be controlled to occur in a link:http://en.wikipedia.org/wiki/Lazy_evaluation[lazy] fashion by using
the step inside `local()`.

[gremlin-groovy,modern]
----
g.V().aggregate(global, 'x').limit(1).cap('x')
g.V().aggregate(local, 'x').limit(1).cap('x')
g.withoutStrategies(EarlyLimitStrategy).V().aggregate(local,'x').limit(1).cap('x')
g.V().aggregate('x').limit(1).cap('x')
g.V().local(aggregate('x')).limit(1).cap('x')
----

It is important to note that `EarlyLimitStrategy` introduced in 3.3.5 alters the behavior of `aggregate(local)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that's a nice side-effect of removal. something to call out in Upgrade Docs to help flesh those out more?

Just a check but have you confirmed that all these examples do in fact return the same results as the ones with aggregate(local)?

Copy link
Contributor Author

@xiazcy xiazcy Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've manually checked that the examples in the docs run, but I can do a doc build to make sure.

Without that strategy (which is installed by default), there are two results in the `aggregate()` side-effect even
though the interval selection is for 1 object. Realize that when the second object is on its way to the `range()`
filter (i.e. `[0..1]`), it passes through `aggregate()` and thus, stored before filtered.

[gremlin-groovy,modern]
----
g.E().aggregate(local,'x').by('weight').cap('x')
g.E().local(aggregate('x')).by('weight').cap('x')
----

*Additional References*

link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#aggregate(java.lang.String)++[`aggregate(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#aggregate(org.apache.tinkerpop.gremlin.process.traversal.Scope,java.lang.String)++[`aggregate(Scope,String)`]

[[all-step]]
=== All Step
Expand Down
90 changes: 90 additions & 0 deletions docs/src/upgrade/release-3.8.x.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,96 @@ The properties file in the above example can either point to a remote configurat

See: link:https://issues.apache.org/jira/browse/TINKERPOP-3017[TINKERPOP-3017]

==== `aggregate()` with `Scope` Removed

The meaning of `Scope` parameters in `aggregate()` have always been unique compared to all other "scopable" steps.
`aggregate(global)` is a `Barrier`, which blocks the traversal until all traversers have been aggregated into the side
effect, where `aggregate(local)` is non-blocking, and will allow traversers to pass before the side effect has been
fully aggregated. This is inconsistent with the semantics of `Scope` in all other steps. For example `dedup(global)`
filters duplicates across the entire traversal stream, while `dedup(local)` filters duplicates within individual `List`
traversers.

The `Scope` parameter is being removed from `aggregate()` to fix inconsistency between the two different use cases: flow
control vs. per-element application. This change aligns all side effect steps (none of the others have scope arguments)
and reserves the `Scope` parameter exclusively for "traverser-local" application patterns, eliminating confusion about
its contextual meanings.

This makes the `AggregateStep` globally scoped by default with eager aggregation. The Lazy evaluation with `aggregate()` is
achieved by wrapping the step in `local()`.

[source,text]
----
// 3.7.x - scope is still supported
gremlin> g.V().aggregate(local, "x").by("age").select("x")
==>[29]
==>[29,27]
==>[29,27]
==>[29,27,32]
==>[29,27,32]
==>[29,27,32,35]

// 3.8.0 - must use aggregate() within local() to achieve lazy aggregation
gremlin> g.V().local(aggregate("x").by("age")).select("x")
==>[29]
==>[29,27]
==>[29,27]
==>[29,27,32]
==>[29,27,32]
==>[29,27,32,35]
----

An slight behavioral difference exists between the removed `aggregate(local)` and its replacement `local(aggregate())`
with respect to handling of bulked traversers. In 3.8.0, `local()` changed from traverser-local to object-local processing,
always debulking incoming traversers into individual objects. This causes `local(aggregate())` to show true lazy, 1 object
at a time aggregation, differing from the original `aggregate(local)`, which always consumed bulked traversers atomically.
There is no workaround to preserve the old "traverser-local" semantics.

[source,text]
----
// 3.7.x - both local() and local scope will preserve bulked traversers
gremlin> g.V().out().barrier().aggregate(local, "x").select("x")
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3],v[2]]
==>[v[3],v[3],v[3],v[2],v[4]]
==>[v[3],v[3],v[3],v[2],v[4],v[5]]
gremlin> g.V().out().barrier().local(aggregate("x")).select("x")
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3],v[2]]
==>[v[3],v[3],v[3],v[2],v[4]]
==>[v[3],v[3],v[3],v[2],v[4],v[5]]

// 3.8.0 - bulked traversers are now split to be processed per-object, this affects local aggregation
gremlin> g.V().out().barrier().local(aggregate("x")).select("x")
==>[v[3]]
==>[v[3],v[3]]
==>[v[3],v[3],v[3]]
==>[v[3],v[3],v[3],v[2]]
==>[v[3],v[3],v[3],v[2],v[4]]
==>[v[3],v[3],v[3],v[2],v[4],v[5]]
----

See: link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-scoping-5.asciidoc[Lazy vs. Eager Evaluation]

==== Removal of `store()` Step

The `store()` step was a legacy name for `aggregate(local)` that has been deprecated since 3.4.3, and is now removed along
with `aggregate(local)`. To achieve lazy aggregation, use `aggregate()` within `local()`.

[source,text]
----
// 3.7.x - store() is still allowed
gremlin> g.V().store("x").by("age").cap("x")
==>[29,27,32,35]

// 3.8.0 - store() removed, use local(aggregate()) to achieve lazy aggregation
gremlin> g.V().local(aggregate("x").by("age")).cap("x")
==>[29,27,32,35]
----

==== split() on Empty String

The `split()` step will now split a string into a list of its characters if the given separator is an empty string.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ import org.apache.tinkerpop.gremlin.process.traversal.step.map.EdgeOtherVertexSt
import org.apache.tinkerpop.gremlin.process.traversal.step.map.EdgeVertexStep
import org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep
import org.apache.tinkerpop.gremlin.process.traversal.step.map.VertexStep
import org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.AggregateGlobalStep
import org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.AggregateStep
import org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.LambdaSideEffectStep
import org.apache.tinkerpop.gremlin.process.traversal.step.util.BulkSet
import org.apache.tinkerpop.gremlin.process.traversal.strategy.AbstractTraversalStrategy
Expand Down Expand Up @@ -91,7 +91,7 @@ class GephiTraversalVisualizationStrategy extends AbstractTraversalStrategy<Trav
Thread.sleep(acceptor.vizStepDelay)
}
}), s, traversal)
TraversalHelper.insertAfterStep(new AggregateGlobalStep(traversal, sideEffectKey), s, traversal)
TraversalHelper.insertAfterStep(new AggregateStep(traversal, sideEffectKey), s, traversal)
}

// decay all vertices except those that made it through the filter - "this way you can watch
Expand All @@ -109,7 +109,7 @@ class GephiTraversalVisualizationStrategy extends AbstractTraversalStrategy<Trav
Thread.sleep(acceptor.vizStepDelay)
}
}), s, traversal)
TraversalHelper.insertAfterStep(new AggregateGlobalStep(traversal, sideEffectKey), s, traversal)
TraversalHelper.insertAfterStep(new AggregateStep(traversal, sideEffectKey), s, traversal)
}
}
}
Expand Down
Loading
Loading