feat(optimizer): annotate types for ORDER BY alias references by doripo · Pull Request #7281 · tobymao/sqlglot

doripo · 2026-03-12T18:11:37Z

When ORDER BY references a projection alias, annotate_types leaves the column typed as UNKNOWN:

import sqlglot
from sqlglot import exp
from sqlglot.optimizer import qualify, annotate_types

query = qualify.qualify(sqlglot.parse_one("SELECT x + 1 AS y FROM t ORDER BY y"), schema={"t": {"x": "INT"}})
annotated = annotate_types.annotate_types(query, schema={"t": {"x": "INT"}})
order = annotated.find(exp.Order)
print(order.expressions[0].this.type)  # UNKNOWN — should be INT

This happens because qualify_columns intentionally preserves ORDER BY alias refs (they're valid SQL in all dialects), so the annotator has no table-qualified column to resolve against. Other clauses (GROUP BY, HAVING) don't have this issue because _expand_alias_refs expands their alias references before annotation.

This PR adds a post-pass (_fixup_order_by_aliases) in annotate_scope that runs after _annotate_expression, when projections are fully typed. It:

Builds an alias-to-type map from query.selects
Walks ORDER BY columns, copying types for bare-column alias matches
Clears _visited entries and non-leaf types on the ORDER BY subtree so _annotate_expression can re-derive compound expression types (e.g., ORDER BY y + 1) from the updated leaves

The _visited clearing is necessary because _annotate_expression skips nodes already in _visited, regardless of _overwrite_types. Subquery subtrees are pruned during the clearing walk because they belong to inner scopes already annotated independently.

Test coverage includes basic alias resolution, shadowing/collisions, sort modifiers, compound expressions, set operations, window functions, subquery-as-projection, type coercion, and regression guards.

When ORDER BY references a projection alias (e.g., SELECT x+1 AS y ... ORDER BY y), the column's type was left as UNKNOWN. qualify_columns intentionally preserves these alias refs (they're valid SQL in all dialects), so the single-pass annotator has no table-qualified column to resolve against. Add a post-pass (_fixup_order_by_aliases) in annotate_scope that runs after projections are fully typed. It builds an alias-to-type map, fixes matching bare columns in ORDER BY, and re-derives parent types on compound expressions (e.g., ORDER BY y + 1) via _reannotate_subtree. This approach avoids modifying _annotate_expression, the core annotation loop. _reannotate_subtree clears non-leaf types (preserving Column/Literal ground truth), prunes at Subquery boundaries, and re-invokes _annotate_expression sequentially.

doripo · 2026-03-12T18:17:01Z

Happy to adjust the approach if you'd prefer this handled differently. A few notes on design choices:

Post-pass vs. modifying _annotate_expression: I went with a post-pass to avoid touching the core annotation loop. The tradeoff is two walks over the ORDER BY subtree, but it keeps the change self-contained.
_reannotate_subtree as a separate method: Seemed cleaner and potentially reusable for traversal-order gaps, but can be inlined into _fixup_order_by_aliases if preferred.

sqlglot/optimizer/annotate_types.py

Replace the post-pass (_fixup_order_by_aliases + _reannotate_subtree) with _resolve_order_by_alias, called from the column annotation path in _annotate_expression. When a bare column in ORDER BY matches a projection alias, it forces the projection's annotation via a recursive call if needed, then copies the type. This resolves alias types during the existing annotation pass instead of walking the ORDER BY subtree twice after the fact. Signed-off-by: Dori Polotsky <doripo@riverpool.ai>

doripo · 2026-03-13T08:53:39Z

Note: _resolve_order_by_alias has extra logic to collect the last matching alias rather than returning on first match, to replicate behavior empirically observed on a local DuckDB and pinned as a test. On the other hand, duplicate alias resolution appears unspecified across dialects -- this could be simplified to an early return and the duplicate alias test dropped if preferred.

georgesittas

@doripo I think I misguided you to an even costlier approach. This didn't work out well:

For every Column in every scope of the input query, you're doing an ancestor search and you walk the projection list of said query, again and again. Both of these are very wasteful.

We should move the logic out of _annotate_expression entirely and make it a post-pass in annotate_scope. After _annotate_expression finishes, all projections are fully typed, so you can:

Grab the Order node from the query
Build an alias -> type map from query.selects
Walk only the ORDER BY columns, copy types for bare-column alias matches
Re-annotate any compound ORDER BY expressions (e.g., y + 1) whose leaf types changed, by calling _annotate_expression on those subtrees

This is simpler, more targeted, and keeps the hot path (_annotate_expression) untouched. It's essentially what you did in the first commit, but without the complexity of _reannotate_subtree, because you can just re-run _annotate_expression on the individual Ordered node, since all leaf types are now known.

georgesittas · 2026-03-16T14:09:57Z

Are there similar issues during the annotation of GROUP BY, HAVING, etc, or do we avoid it due to expanding the alias references in qualify_columns for these clauses?

doripo · 2026-03-16T18:24:13Z

@georgesittas Thanks for the detailed guidance — agreed, the in-loop approach bought us more than we bargained for.

Two things I want to confirm before revising:

After the main pass, Ordered subtree nodes are in _visited, so _annotate_expression would skip them. I'll still need to clear _visited entries for non-leaf nodes before re-invoking — is that what you had in mind, or is there a way to avoid that walk?
I read your suggestion as: only re-annotate compound Ordered expressions (e.g., y + 1), as opposed to trivial alias refs (e.g., ORDER BY y) where setting the Column type is sufficient. Is that right?

Re: GROUP BY / HAVING — those indeed don't have this issue because they are expanded. If we added ORDER BY to _expand_alias_refs in qualify_columns, then SELECT x+1 AS y FROM t ORDER BY y would become ORDER BY x+1 and not need a post-pass, but that's a bigger external change. Worth exploring?

georgesittas · 2026-03-18T08:33:37Z

Hey @doripo, apologies for the delay here. Let me answer your questions:

After the main pass, Ordered subtree nodes are in _visited, so _annotate_expression would skip them. I'll still need to clear _visited entries for non-leaf nodes before re-invoking — is that what you had in mind, or is there a way to avoid that walk?

I think you can manually set _overwrite_types to True locally, when you're about to annotate the Ordered nodes. Only do this for subtress whose types you know will change, to avoid messing up existing types. This is just an idea, so make sure you explore if it has any unintended side-effects while at it.

I read your suggestion as: only re-annotate compound Ordered expressions (e.g., y + 1), as opposed to trivial alias refs (e.g., ORDER BY y) where setting the Column type is sufficient. Is that right?

I think we want to annotate all of the Ordered nodes, right? If you skip ORDER BY y, won't y's type remain unknown?

Re: GROUP BY / HAVING — those indeed don't have this issue because they are expanded. If we added ORDER BY to _expand_alias_refs in qualify_columns, then SELECT x+1 AS y FROM t ORDER BY y would become ORDER BY x+1 and not need a post-pass, but that's a bigger external change. Worth exploring?

Cool, that's what I expected. No, let's not worry about this for now.

georgesittas · 2026-03-18T08:33:58Z

Will be out for a few days, @geooo109 or @VaggelisD can you keep an eye on this PR?

Revert the in-loop approach and restore the post-pass in annotate_scope. Inline the reannotation logic into _fixup_order_by_aliases instead of a separate _reannotate_subtree method. Update duplicate alias test comment to reference _expand_alias_refs consistency. Signed-off-by: Dori Polotsky <doripo@riverpool.ai>

doripo · 2026-03-18T23:23:10Z

Thanks for the detailed answers, and no worries!

Re: _overwrite_types — explored it, but _visited is checked first in the skip condition of _annotate_expression, so it blocks re-entry regardless of _overwrite_types (which is already True by default). Clearing _visited entries for non-leaf nodes is still needed for _annotate_expression to re-derive compound expression types. (Worth noting separately — could be that _overwrite_types was intended to bypass _visited as well?)

Re: annotating all Ordered nodes — yes, the post-pass sets leaf types and then clears + re-annotates the full ORDER BY subtree, covering both simple and compound cases.

The third commit brings it back close to the original post-pass, with the reannotation inlined - updated the PR description to reflect the current approach.

Happy to refine further from here as necessary.

georgesittas reviewed Mar 12, 2026

View reviewed changes

sqlglot/optimizer/annotate_types.py Show resolved Hide resolved

georgesittas requested changes Mar 16, 2026

View reviewed changes

georgesittas force-pushed the main branch from 9dd8bd2 to a3929be Compare March 16, 2026 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimizer): annotate types for ORDER BY alias references#7281

feat(optimizer): annotate types for ORDER BY alias references#7281
doripo wants to merge 3 commits intotobymao:mainfrom
doripo:feat-orderby-alias-types

doripo commented Mar 12, 2026 •

edited

Loading

Uh oh!

doripo commented Mar 12, 2026

Uh oh!

Uh oh!

doripo commented Mar 13, 2026

Uh oh!

georgesittas left a comment

Uh oh!

georgesittas commented Mar 16, 2026

Uh oh!

doripo commented Mar 16, 2026

Uh oh!

georgesittas commented Mar 18, 2026

Uh oh!

georgesittas commented Mar 18, 2026

Uh oh!

doripo commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

doripo commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

doripo commented Mar 12, 2026

Uh oh!

Uh oh!

doripo commented Mar 13, 2026

Uh oh!

georgesittas left a comment

Choose a reason for hiding this comment

Uh oh!

georgesittas commented Mar 16, 2026

Uh oh!

doripo commented Mar 16, 2026

Uh oh!

georgesittas commented Mar 18, 2026

Uh oh!

georgesittas commented Mar 18, 2026

Uh oh!

doripo commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

doripo commented Mar 12, 2026 •

edited

Loading