Skip to content

Conversation

not-napoleon
Copy link
Member

Relates to #128621

This PR adds more building blocks for working with dimensions in ESQL. Specifically, it makes the TSDB metadata available on attributes.

I did not add new tests for this yet. That's on purpose, as we have more components we need before we can really wire this up the way we want to.

 Conflicts:
	x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/Attribute.java
@@ -83,13 +123,8 @@ public static Reader<? extends EsField> getReader(String name) {
private final Map<String, EsField> properties;
private final String name;
private final boolean isAlias;
// Because the subclasses all reimplement serialization, this needs to be writeable from subclass constructors
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment was outdated - I ended up not reimplementing the serialization in the sub classes, but forgot to delete this in the previous PR)

timeSeriesFieldType = timeSeriesFieldType.merge(EsField.TimeSeriesFieldType.fromIndexFieldCapabilities(fc));
} catch (IllegalArgumentException e) {
return new InvalidMappedField(name, e.getMessage());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core thing this PR does. As we are loading the fields, we collect their time series metadata, and merge it across indices. This is then made available to the field attribute.

@@ -165,7 +165,6 @@ public void testFallbackIndicesOptions() throws Exception {
assertThat(cloned.clusterAlias(), equalTo(request.clusterAlias()));
assertThat(cloned.sessionId(), equalTo(request.sessionId()));
RemoteClusterPlan plan = cloned.remoteClusterPlan();
assertThat(plan.plan(), equalTo(request.remoteClusterPlan().plan()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test was failing on this line because plans serialized with older transport versions are not equal to plans serialized with the new transport version. This is expected, since when translating from the old version, we default the ts metadata to unknown, while the current version serializes and deserializes an actual value.

I discussed this failure with @dnhatn , and we concluded this assert wasn't adding any value in this test.

@not-napoleon not-napoleon marked this pull request as ready for review July 22, 2025 17:17
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@not-napoleon not-napoleon requested a review from dnhatn July 23, 2025 18:45
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question, but the changes look good. I look forward to seeing the next part. I think we might use this information only on the coordinator; hence I'm not sure if we need to serialize it, but I'm okay with this approach as well.

if (other != METRIC) {
return DIMENSION;
}
throw new IllegalStateException("Time Series Metadata conflict. Cannot merge [" + other + "] with [DIMENSION].");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, we do not allow mixing dimensions and metrics, but we do allow NONE with dimensions. I think the GAUGE type is optional - should we treat it similarly to NONE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is allowing for a migration from a label to a dimension. Labels would have neither a metric nor a dimension type, and would show up as NONE here, but if a later mapping marks them as dimensions, we should pick that up.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, small drive-by review because I saw that we're making changes to the Attribute hierarchy. This part of the change looks good to me, but I left a small number of thoughts. Didn't review the rest.

@@ -107,6 +107,11 @@ protected String label() {
return UNRESOLVED_PREFIX;
}

@Override
public boolean isDimension() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: maybe this should throw UOE, because we can't determine this on an unresolved attr.


@Override
public boolean isDimension() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: it might get tricky when dealing with renames; if you have a FROM idx | RENAME dimension_field AS foo, then the reference attribute for foo will claim not to be a dimension - so downstream plan nodes may need to resolve aliases to correctly determine if they're looking at a dimension or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from @kkrik-es is that, at least for the initial version, that's an acceptable restriction. I think ReferenceAttributes are used for the output of EVAL and similar commands, and at least for the initial version of this we do not want to treat those as dimensions.

/**
* @return true if the attribute represents a TSDB dimension type
*/
public abstract boolean isDimension();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should actually be ternary, resp. return Boolean (capital B) so that attributes can return null to say "I don't know if I refer to a dimension" rather than claiming not to.

Otherwise 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the behavior associated with the null version? The intention here is that we want to only allow filtering by dimensions, and enable automatic grouping by all dimensions. If a field returns null here, we would then need to decide if "null" behaved as "this is a dimension" or "this is not a dimension" for these cases.

Unless you're proposing that we fail (or null-with-warning) any query where there is ambiguity about the dimensions? I'm not sure I agree with that.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's get this in. Thanks Mark!

@not-napoleon not-napoleon enabled auto-merge (squash) August 15, 2025 17:05
@not-napoleon not-napoleon merged commit b53d5f7 into elastic:main Aug 18, 2025
34 checks passed
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Aug 18, 2025
Relates to elastic#128621

This PR adds more building blocks for working with dimensions in ESQL. Specifically, it makes the TSDB metadata available on attributes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants