Skip to content

Add SQLAlchemy-based SqlClient implementations for metricflow tests#1966

Merged
tlento merged 10 commits intomainfrom
add-sqlalchemy-sqlclient
Feb 5, 2026
Merged

Add SQLAlchemy-based SqlClient implementations for metricflow tests#1966
tlento merged 10 commits intomainfrom
add-sqlalchemy-sqlclient

Conversation

@tlento
Copy link
Contributor

@tlento tlento commented Feb 2, 2026

Add SQLAlchemy-based SqlClient implementations for metricflow tests

The engine tests currently execute via calls through dbt adapters
maanged by an AdapterBackedSqlClient instance (and its corresponding
DDL-enabled class).

In order to allow for ongoing development and testing in a world where
dbt-core depends on metricflow we need to remove the pass-through
dependency on dbt-core that our reliance on dbt-adapters imposes on
us.

This is the first step in removing metricflow's test package dependencies
on dbt adapters - moving test execution to a SQLAlchemy-based client
that has many of the same test configuration advantages of the
AdapterBackedSqlClient without the dbt dependencies.

The full changeset associated with this (PR #1966) includes the addition
of the client plus independent changes to cut over each engine, as follows:

  1. Cut DuckDB tests over to SqlAlchemySqlClient

This includes one necessary update to error messaging snapshots, as the dbt
adapters use a custom exception wrapper that does dbt-specific formatting.

  1. Cut postgres tests over to SqlAlchemySqlClient

No special changes required.

  1. Cut Trino over to SqlAlchemy SqlClient

Trino uses a catalog rather than a database, and in their
sqlalchemy dialect they override the standard URL class to always
be a string url. Since that was a nuisance I chose to simply overload
the database element in the standard SqlAlchemy URL format and URL class
to be the catalog for the Trino case. This works in our tests, and probably
works in general, but it might not be robust in all scenarios.

In addition, semi-colons in sql expressions were causing syntax
errors with the sqlalchemy Trino client, so these were removed from the
affected test cases.

  1. Cut Databricks tests over to SqlAlchemySqlClient

This includes a bug fix for extracting the httppath parameter
from the engine URL. The original issue never emerged because it was in
a dead code branch, but the SqlAlchemy client uses it.

  1. Cut Snowflake tests over to SqlAlchemySqlClient

Snowflake's SQLAlchemy connector hews to the SQLAlchemy standard
of rendering case-insensitive object names in lower-case, so this
change updates our snapshots accordingly.

  1. Cut Redshift tests over to SqlAlchemySqlClient

Making Redshift work with SqlAlchemy was a bit trickier than expected,
as the "official" SqlAlchemy client for Redshift hasn't been updated
in 2 years and does not work with SqlAlchemy 2.x.

As a hack around this problem, this change adds a bare-bones custom
dialect override for psycopg2 that clobbers some class property to
allow for backwards compatibility with Redshift. Since we only ever
pass in sql query strings as sa_text() and always qualify our
relation names with the schema inline in the queries this works, but
if we ever need to relax those constraints we'll either need to find
a more full-featured dialect for Redshift or else switch to using the
redshift-connector package directly.

  1. Cut BigQuery tests over to SqlAlchemySqlClient

BigQuery has some special connection requirements, and does not support
the EXPLAIN keyword, so the dry run configuration is also special.

This updates the client accordingly.

@cla-bot cla-bot bot added the cla:yes label Feb 2, 2026
@github-actions
Copy link

github-actions bot commented Feb 2, 2026

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link
Contributor Author

tlento commented Feb 2, 2026

@tlento tlento force-pushed the add-sqlalchemy-sqlclient branch 2 times, most recently from c5c977c to 648f98f Compare February 2, 2026 18:21
@tlento tlento added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Feb 2, 2026
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS February 2, 2026 19:02 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS February 2, 2026 19:02 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS February 2, 2026 19:02 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS February 2, 2026 19:02 — with GitHub Actions Inactive
@tlento tlento marked this pull request as ready for review February 2, 2026 20:10
@tlento tlento requested a review from a team as a code owner February 2, 2026 20:10
@tlento tlento removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Feb 2, 2026
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom! Left a few comments.

"""
start = time.perf_counter()

if sql_bind_parameter_set.param_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually received a request to support bind parameters again. Outside the scope of the PR, but would it be straightforward to enable this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing about bind parameters is straightforward. 😛 But given that, this client shouldn't be a major source of problems for you.

We use sa_text() everywhere, so you can pass bind parameters through to the execute() call. Within a test context this shouldn't be too difficult to get working. The issue is we have to render the inline parameter using the SqlAlchemy format, which works for some-but-not-all dialects even within SqlAlchemy. Databricks, for example, caused a bunch of problems back in the day, but we were able to work around that.

More broadly, you'll have to be cautious about how those bind parameters get specified and rendered in the final query text. The rendering is probably not the same for every dialect within SqlAlchemy (and our Redshift shim might cause weird issues, although I'd expect it to work). It also may not be the same for SqlAlchemy vs JDBC/DBAPI/ArrowFlightSQL/etc. - engines do have divergence in input and output formats between their SqlAlchemy implementations and their direct-access APIs.

Copy link
Contributor Author

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @plypaul ! I'll update inlines tomorrow and re-run the engine tests before merging.

"""
start = time.perf_counter()

if sql_bind_parameter_set.param_dict:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing about bind parameters is straightforward. 😛 But given that, this client shouldn't be a major source of problems for you.

We use sa_text() everywhere, so you can pass bind parameters through to the execute() call. Within a test context this shouldn't be too difficult to get working. The issue is we have to render the inline parameter using the SqlAlchemy format, which works for some-but-not-all dialects even within SqlAlchemy. Databricks, for example, caused a bunch of problems back in the day, but we were able to work around that.

More broadly, you'll have to be cautious about how those bind parameters get specified and rendered in the final query text. The rendering is probably not the same for every dialect within SqlAlchemy (and our Redshift shim might cause weird issues, although I'd expect it to work). It also may not be the same for SqlAlchemy vs JDBC/DBAPI/ArrowFlightSQL/etc. - engines do have divergence in input and output formats between their SqlAlchemy implementations and their direct-access APIs.

tlento added 10 commits February 4, 2026 19:31
The engine tests currently execute via calls through dbt adapters
maanged by an AdapterBackedSqlClient instance (and its corresponding
DDL-enabled class).

In order to allow for ongoing development and testing in a world where
dbt-core depends on metricflow we need to remove the pass-through
dependency on dbt-core that our reliance on dbt-adapters imposes on
us.

This is the first step in removing metricflow's test package dependencies
on dbt adapters - a SQLAlchemy-based client that has many of the same
test configuration advantages of the AdapterBackedSqlClient without
the dbt dependencies.

The base functionality is in place, and all tests run against all
engines pass (and all environments build), but the client itself is
not currently in use and as such there may be runtime flaws that we
have not yet detected.

This is not totally desirable, but I'm too lazy to adjust the robot's
approach to avoid what amoutns to a dead code commit. Subsequent changes
will enable this client across all of the engines, and we will fix
up any issues we uncover as we go.
This removes the dbt-duckdb dependency from the base dev-env
and moves all tests over to use the SqlAlchemySqlClient instead
of the AdapterBackedSqlCLient. It includes one necessary update
to error messaging snapshots, as the dbt adapters use a custom
exception wrapper that does dbt-specific formatting.

With this we are ready to proceed with moving over engine-specific
clients.
This change moves Trino over with the minimum required updates
for making things work. In particular:

1. Trino uses a catalog rather than a database, and in their
sqlalchemy dialect they override the standard URL class to always
be a string url. Since that was a nuisance I chose to simply overload
the database element in the standard SqlAlchemy URL format and URL class
to be the catalog for the Trino case. This works in our tests, and probably
works in general, but it might not be robust in all scenarios.

2. For some reason semi-colons in sql expressions were causing syntax
errors with the sqlalchemy Trino client. I guess the dbt-trino adapter
(or the base dbt adapter) was removing trailing semicolons on execute.
This includes a bug fix for what turned out to be a dead code
branch, which the dbt adapter integration did not use but we do.

Now the httppath parameter is parsed correctly and we can use
SqlAlchemy with Databricks.
Snowflake has some strange ideas about how to render case-insensitive
database object names, so historically we've had these capitalized
names in our snapshots.

Snowflake's SQLAlchemy connector hews to the SQLAlchemy standard
of rendering case-insensitive object names in lower-case, so this
change updates our snapshots accordingly.
Making Redshift work with SqlAlchemy was a bit trickier than expected,
as the "official" SqlAlchemy client for Redshift hasn't been updated
in 2 years and does not work with SqlAlchemy 2.x.

As a hack around this problem, this commit adds a bare-bones custom
dialect override for  psycopg2 that clobbers some class property to
allow for backwards compatibility with Redshift. Since we only ever
pass in sql query strings as sa_text() and always qualify our
relation names with the schema inline in the queries this works, but
if we ever need to relax those constraints we'll either need to find
a more full-featured dialect for Redshift or else switch to using the
redshift-connector package directly.
BigQuery has some special connection requirements, and does not support
the EXPLAIN keyword, so the dry run configuration is also special.

This updates the client accordingly.
@tlento tlento force-pushed the add-sqlalchemy-sqlclient branch from 648f98f to 5a5ef7e Compare February 5, 2026 00:56
Copy link
Contributor Author

tlento commented Feb 5, 2026

Merge activity

  • Feb 5, 1:39 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Feb 5, 1:40 AM UTC: @tlento merged this pull request with Graphite.

@tlento tlento merged commit 182e6da into main Feb 5, 2026
14 checks passed
@tlento tlento deleted the add-sqlalchemy-sqlclient branch February 5, 2026 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants