fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPLIT=True #34111

LucasWolke · 2025-07-09T09:50:42Z

SUMMARY

Very simple fix for bug described in: #30169
Currently all queries in sqllab are mutated like this:
query.executed_sql = database.mutate_sql_based_on_config(block)

def mutate_sql_based_on_config(self, sql_: str, is_split: bool = False) -> str:
        sql_mutator = config["SQL_QUERY_MUTATOR"]
        if sql_mutator and (is_split == config["MUTATE_AFTER_SPLIT"]):
            return sql_mutator(
                sql_,
                security_manager=security_manager,
                database=self,
            )
        return sql_

This means is_split is always False -> We always use the sql_mutator when MUTATE_AFTER_SPLIT is False, and never when it's True.

The fix passes the value of MUTATE_AFTER_SPLIT to the mutate_sql_based_on_config function, so is_split == config["MUTATE_AFTER_SPLIT"] always evaluates to True.

This does not break current implementations and fix the bug.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

In config.py:

MUTATE_AFTER_SPLIT = True


def SQL_QUERY_MUTATOR(  # pylint: disable=invalid-name,unused-argument
    sql: str, **kwargs
) -> str:
    print("Query mutator called") ## will now print during SQLab sessions
    return sql

Run query in sql lab -> Now prints message when MUTATE_AFTER_SPLIT = True or False

ADDITIONAL INFORMATION

Fixes #30169

Has associated issue: SQL_QUERY_MUTATOR not called in SQLab with MUTATE_AFTER_SPLIT = True #30169
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

…IT=True

github-actions

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Vague inline comment for SQL mutation ▹ view	🧠 Not in standard

Files scanned

File Path	Reviewed
superset/sql_lab.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

superset/sql_lab.py

            # Hook to allow environment-specific mutation (usually comments) to the SQL
-            query.executed_sql = database.mutate_sql_based_on_config(block)
+            query.executed_sql = database.mutate_sql_based_on_config(block, is_split=config["MUTATE_AFTER_SPLIT"])


codecov · 2025-07-09T09:59:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.98%. Comparing base (d951158) to head (0e0e33f).
Report is 3 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #34111       +/-   ##
===========================================
+ Coverage        0   72.98%   +72.98%     
===========================================
  Files           0      559      +559     
  Lines           0    40520    +40520     
  Branches        0     4268     +4268     
===========================================
+ Hits            0    29575    +29575     
- Misses          0     9836     +9836     
- Partials        0     1109     +1109

Flag	Coverage Δ
hive	`47.13% <100.00%> (?)`
mysql	`71.97% <100.00%> (?)`
postgres	`72.03% <100.00%> (?)`
presto	`50.88% <100.00%> (?)`
python	`72.95% <100.00%> (?)`
sqlite	`71.57% <100.00%> (?)`
unit	`100.00% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mistercrunch · 2025-07-09T22:01:35Z

superset/sql_lab.py

@@ -475,7 +475,9 @@ def execute_sql_statements(  # noqa: C901
            db.session.commit()

            # Hook to allow environment-specific mutation (usually comments) to the SQL
-            query.executed_sql = database.mutate_sql_based_on_config(block)
+            query.executed_sql = database.mutate_sql_based_on_config(
+                block, is_split=config["MUTATE_AFTER_SPLIT"]


Not sure if I understand the logic in place, but shouldn't the is_split argument here be related to whether the sql-to-be-mutated is currently split (as in based on the context of where the function is called), and not be the config flag that means "I want to mutate before/after the split"?

From my understanding the code calling mutate_sql_based_on_config should call it twice, once before it's split with is_split=False and once after it's split with is_split=True

If that's the case, it might be worth changing the function definition to not set a default and, forcing all calls to mutate_sql_based_on_config to be explicit at every call.

The whole thing seems brittle and a bit of a hack. Ideally we'd be consistent here, say always call before split and make it easy for mutators to apply a split if they want, but that would break backwards compatibility ...

Thanks for your assessment. To clarify my approach: the MUTATE_AFTER_SPLIT flag was introduced specifically to control when the query mutator is applied for chart queries, as discussed in #21645. This behavior was already established in SQL Lab, so the flag was not originally intended to affect SQL Lab queries.
From my understanding, the is_split parameter is also primarily meant for chart queries, to ensure backward compatibility.
It would be possible to create a separate mutate_sql function that always applies the mutator, just for SQL Lab (if semantic clarity is the issue with my fix), or to add a comment clarifying why the config flag is passed. However, I believe always applying the mutator after splitting the query in SQL Lab is the intended and correct approach - the "before or after splitting" logic was only ever meant for chart queries, not for SQL Lab.

What I'm thinking is that is_split on the line right above these comments should probably hard coded to True or False, depending on whether the SQL at that time is split or not.

Looking at where it executes ->

superset/superset/models/core.py

Line 646 in 5a32777

if sql_mutator and (is_split == config["MUTATE_AFTER_SPLIT"]):

, it only applies the logic if the parameter matches the config. Meaning the argument is supposed to represent whether the SQL is split or not at the time of the call.

If merging the code as is, that method would always execute since config["MUTATE_AFTER_SPLIT"] == config["MUTATE_AFTER_SPLIT"]

The sql lab shouldn't even be using the mutate_sql_based_on_config method, it always wants to mutate the sql.

I think the confusion comes from a refactor you did here.

- sql = SQL_QUERY_MUTATOR( - sql, - security_manager=security_manager, - database=database, - ) + sql = database.mutate_sql_based_on_config(sql)

The sql lab was refactored to use mutate_sql_based_on_config, but that method was only written for a fix in chart queries (#21645) . In the sql lab we don't care about is_split.

I propose that we either create a new mutate_sql function that always mutates or just use the mutator like before the refactor.

Again, we don't want to use the before or after split logic, that was never intended for the sql lab. Hope this makes more sense now.

Right. I remember some of the context now. Generally we're trying to bring the SQL-execution code paths back together, and there's more work to be done here.

I think SQL_QUERY_MUTATOR is a super useful config hook, but seems it has grown a bit complex. It's tempting to clarify/redesign, but we have to be cautious around backwards compatibility. Dilemma is probably around change management here.

Let's try to understand the ideal scenario here, and see if we can preserve backwards compatibility.

First, let's start with trying to itemize common use cases for SQL_QUERY_MUTATOR:

Add pandas-highcharts as requirements #1 is clearly logging, where we add comments to all the SQL executed with information about the query provenance, and the user who triggered it. This allows database admins to get more visibility into who's behind the service account. Many organizations use this to parse user/tool/context information and use it for analytics

some security/data-access-policy stuff, maybe you have complex data access policy where you want to intercept some things, or double check some rules, and fail queries if something that shouldn't be allowed is submitted, simply raise an exception from here

some redirection / rewiring, maybe redirecting to a similar dataset for things like aggregate awareness, ...

?

Thinking about this method and keeping it backwards compatible, one thing we could do is passing in more context information, like a source parameter that would inform whether this SQL comes from SQL Lab, Explore, alerts, populating filters' list of values ... Adding net new args to the function should be backwards compatible as long as user catch/disregard *args **kwargs.

About the is_split flag, looking at the code, it looks like in explore we call mutate_sql_based_on_config both before and after the SQL is broken into chunks for execution. Seems this can be confusing for people authoring SQL_QUERY_MUTATOR, where you have to assume something like:

in explore, this method will be called twice, once with is_split=False first, and once more for each statement with is_split=True

in SQL Lab, will be called once for the block (could probably be called twice too following the same pattern as for explore)

throw in some session/env settings, stuff like calling SET and things like that. Kind of a hack but who knows what people might be using this for ...

Knowing all this, it's unclear to me what to do, especially given the backwards-compatibility concerns.

Best "mostly backwards"-compatible way forward: add a source args for clarity, call twice for SQL Lab for consistency (?) Add some notes in UPGRADING.md about the changes to notify upgraders.

Breaking backwards compatibility? Flag SQL_QUERY_MUTATOR for deprecation and redesign the hook(s), maybe something like:

SQL_BLOCK_MUTATOR(source) # always called even if block has a single statement SQL_STATEMENT_MUTATOR(source) # called for each sub-statement

Curious on what @betodealmeida or others might thing is best here. This could be slated along with overdue work around bringing all SQL execution codepaths together, and fix/improve other related issues around code and logic duplication across explore and SQL Lab.

fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPL…

ee19d84

…IT=True

pull-request-size bot added the size/XS label Jul 9, 2025

rusackas requested review from betodealmeida and Vitor-Avila July 9, 2025 09:54

github-actions bot reviewed Jul 9, 2025

View reviewed changes

korbit-ai bot reviewed Jul 9, 2025

View reviewed changes

dosubot bot added the sqllab Namespace | Anything related to the SQL Lab label Jul 9, 2025

style(sqllab): shorten line to comply with ruff E501 line length limit

0e0e33f

mistercrunch reviewed Jul 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPLIT=True #34111

fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPLIT=True #34111

Uh oh!

LucasWolke commented Jul 9, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

codecov bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

mistercrunch Jul 9, 2025

Uh oh!

mistercrunch Jul 9, 2025

Uh oh!

mistercrunch Jul 9, 2025

Uh oh!

mistercrunch Jul 9, 2025

Uh oh!

LucasWolke Jul 11, 2025

Uh oh!

mistercrunch Jul 11, 2025

Uh oh!

LucasWolke Jul 11, 2025

Uh oh!

mistercrunch Jul 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPLIT=True #34111

Are you sure you want to change the base?

fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPLIT=True #34111

Uh oh!

Conversation

LucasWolke commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

This comment was marked as resolved.

Uh oh!

codecov bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mistercrunch Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

mistercrunch Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

mistercrunch Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

mistercrunch Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWolke Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

mistercrunch Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWolke Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

mistercrunch Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LucasWolke commented Jul 9, 2025 •

edited

Loading

korbit-ai bot left a comment •

edited

Loading

codecov bot commented Jul 9, 2025 •

edited

Loading

mistercrunch Jul 12, 2025 •

edited

Loading