Skip to content

fix(sqllab): ensure SQL_QUERY_MUTATOR is called when MUTATE_AFTER_SPLIT=True #34111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

LucasWolke
Copy link

@LucasWolke LucasWolke commented Jul 9, 2025

SUMMARY

Very simple fix for bug described in: #30169
Currently all queries in sqllab are mutated like this:
query.executed_sql = database.mutate_sql_based_on_config(block)

def mutate_sql_based_on_config(self, sql_: str, is_split: bool = False) -> str:
        sql_mutator = config["SQL_QUERY_MUTATOR"]
        if sql_mutator and (is_split == config["MUTATE_AFTER_SPLIT"]):
            return sql_mutator(
                sql_,
                security_manager=security_manager,
                database=self,
            )
        return sql_

This means is_split is always False -> We always use the sql_mutator when MUTATE_AFTER_SPLIT is False, and never when it's True.

The fix passes the value of MUTATE_AFTER_SPLIT to the mutate_sql_based_on_config function, so is_split == config["MUTATE_AFTER_SPLIT"] always evaluates to True.

This does not break current implementations and fix the bug.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

In config.py:

MUTATE_AFTER_SPLIT = True


def SQL_QUERY_MUTATOR(  # pylint: disable=invalid-name,unused-argument
    sql: str, **kwargs
) -> str:
    print("Query mutator called") ## will now print during SQLab sessions
    return sql

Run query in sql lab -> Now prints message when MUTATE_AFTER_SPLIT = True or False

ADDITIONAL INFORMATION

Fixes #30169

  • Has associated issue: SQL_QUERY_MUTATOR not called in SQLab with MUTATE_AFTER_SPLIT = True #30169
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Documentation Vague inline comment for SQL mutation ▹ view 🧠 Not in standard
Files scanned
File Path Reviewed
superset/sql_lab.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

Comment on lines 477 to 478
# Hook to allow environment-specific mutation (usually comments) to the SQL
query.executed_sql = database.mutate_sql_based_on_config(block)
query.executed_sql = database.mutate_sql_based_on_config(block, is_split=config["MUTATE_AFTER_SPLIT"])

This comment was marked as resolved.

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.98%. Comparing base (d951158) to head (0e0e33f).
Report is 3 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #34111       +/-   ##
===========================================
+ Coverage        0   72.98%   +72.98%     
===========================================
  Files           0      559      +559     
  Lines           0    40520    +40520     
  Branches        0     4268     +4268     
===========================================
+ Hits            0    29575    +29575     
- Misses          0     9836     +9836     
- Partials        0     1109     +1109     
Flag Coverage Δ
hive 47.13% <100.00%> (?)
mysql 71.97% <100.00%> (?)
postgres 72.03% <100.00%> (?)
presto 50.88% <100.00%> (?)
python 72.95% <100.00%> (?)
sqlite 71.57% <100.00%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dosubot dosubot bot added the sqllab Namespace | Anything related to the SQL Lab label Jul 9, 2025
@@ -475,7 +475,9 @@ def execute_sql_statements( # noqa: C901
db.session.commit()

# Hook to allow environment-specific mutation (usually comments) to the SQL
query.executed_sql = database.mutate_sql_based_on_config(block)
query.executed_sql = database.mutate_sql_based_on_config(
block, is_split=config["MUTATE_AFTER_SPLIT"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand the logic in place, but shouldn't the is_split argument here be related to whether the sql-to-be-mutated is currently split (as in based on the context of where the function is called), and not be the config flag that means "I want to mutate before/after the split"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding the code calling mutate_sql_based_on_config should call it twice, once before it's split with is_split=False and once after it's split with is_split=True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, it might be worth changing the function definition to not set a default and, forcing all calls to mutate_sql_based_on_config to be explicit at every call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole thing seems brittle and a bit of a hack. Ideally we'd be consistent here, say always call before split and make it easy for mutators to apply a split if they want, but that would break backwards compatibility ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your assessment. To clarify my approach: the MUTATE_AFTER_SPLIT flag was introduced specifically to control when the query mutator is applied for chart queries, as discussed in #21645. This behavior was already established in SQL Lab, so the flag was not originally intended to affect SQL Lab queries.
From my understanding, the is_split parameter is also primarily meant for chart queries, to ensure backward compatibility.
It would be possible to create a separate mutate_sql function that always applies the mutator, just for SQL Lab (if semantic clarity is the issue with my fix), or to add a comment clarifying why the config flag is passed. However, I believe always applying the mutator after splitting the query in SQL Lab is the intended and correct approach - the "before or after splitting" logic was only ever meant for chart queries, not for SQL Lab.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm thinking is that is_split on the line right above these comments should probably hard coded to True or False, depending on whether the SQL at that time is split or not.

Looking at where it executes ->

if sql_mutator and (is_split == config["MUTATE_AFTER_SPLIT"]):
, it only applies the logic if the parameter matches the config. Meaning the argument is supposed to represent whether the SQL is split or not at the time of the call.

If merging the code as is, that method would always execute since config["MUTATE_AFTER_SPLIT"] == config["MUTATE_AFTER_SPLIT"]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sql lab shouldn't even be using the mutate_sql_based_on_config method, it always wants to mutate the sql.

I think the confusion comes from a refactor you did here.

-    sql = SQL_QUERY_MUTATOR(
-        sql,
-        security_manager=security_manager,
-        database=database,
-    )
+    sql = database.mutate_sql_based_on_config(sql)

The sql lab was refactored to use mutate_sql_based_on_config, but that method was only written for a fix in chart queries (#21645) . In the sql lab we don't care about is_split.

I propose that we either create a new mutate_sql function that always mutates or just use the mutator like before the refactor.

Again, we don't want to use the before or after split logic, that was never intended for the sql lab. Hope this makes more sense now.

Copy link
Member

@mistercrunch mistercrunch Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I remember some of the context now. Generally we're trying to bring the SQL-execution code paths back together, and there's more work to be done here.

I think SQL_QUERY_MUTATOR is a super useful config hook, but seems it has grown a bit complex. It's tempting to clarify/redesign, but we have to be cautious around backwards compatibility. Dilemma is probably around change management here.

Let's try to understand the ideal scenario here, and see if we can preserve backwards compatibility.

First, let's start with trying to itemize common use cases for SQL_QUERY_MUTATOR:

  • Add pandas-highcharts as requirements #1 is clearly logging, where we add comments to all the SQL executed with information about the query provenance, and the user who triggered it. This allows database admins to get more visibility into who's behind the service account. Many organizations use this to parse user/tool/context information and use it for analytics
  • some security/data-access-policy stuff, maybe you have complex data access policy where you want to intercept some things, or double check some rules, and fail queries if something that shouldn't be allowed is submitted, simply raise an exception from here
  • some redirection / rewiring, maybe redirecting to a similar dataset for things like aggregate awareness, ...
  • ?

Thinking about this method and keeping it backwards compatible, one thing we could do is passing in more context information, like a source parameter that would inform whether this SQL comes from SQL Lab, Explore, alerts, populating filters' list of values ... Adding net new args to the function should be backwards compatible as long as user catch/disregard *args **kwargs.

About the is_split flag, looking at the code, it looks like in explore we call mutate_sql_based_on_config both before and after the SQL is broken into chunks for execution. Seems this can be confusing for people authoring SQL_QUERY_MUTATOR, where you have to assume something like:

  • in explore, this method will be called twice, once with is_split=False first, and once more for each statement with is_split=True
  • in SQL Lab, will be called once for the block (could probably be called twice too following the same pattern as for explore)
  • throw in some session/env settings, stuff like calling SET and things like that. Kind of a hack but who knows what people might be using this for ...

Knowing all this, it's unclear to me what to do, especially given the backwards-compatibility concerns.

Best "mostly backwards"-compatible way forward: add a source args for clarity, call twice for SQL Lab for consistency (?) Add some notes in UPGRADING.md about the changes to notify upgraders.

Breaking backwards compatibility? Flag SQL_QUERY_MUTATOR for deprecation and redesign the hook(s), maybe something like:

SQL_BLOCK_MUTATOR(source)  # always called even if block has a single statement
SQL_STATEMENT_MUTATOR(source) # called for each sub-statement

Curious on what @betodealmeida or others might thing is best here. This could be slated along with overdue work around bringing all SQL execution codepaths together, and fix/improve other related issues around code and logic duplication across explore and SQL Lab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XS sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL_QUERY_MUTATOR not called in SQLab with MUTATE_AFTER_SPLIT = True
2 participants