Add SQLAlchemy-based SqlClient implementations for metricflow tests#1966
Add SQLAlchemy-based SqlClient implementations for metricflow tests#1966
Conversation
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
c5c977c to
648f98f
Compare
plypaul
left a comment
There was a problem hiding this comment.
Thanks Tom! Left a few comments.
| """ | ||
| start = time.perf_counter() | ||
|
|
||
| if sql_bind_parameter_set.param_dict: |
There was a problem hiding this comment.
We actually received a request to support bind parameters again. Outside the scope of the PR, but would it be straightforward to enable this case?
There was a problem hiding this comment.
Nothing about bind parameters is straightforward. 😛 But given that, this client shouldn't be a major source of problems for you.
We use sa_text() everywhere, so you can pass bind parameters through to the execute() call. Within a test context this shouldn't be too difficult to get working. The issue is we have to render the inline parameter using the SqlAlchemy format, which works for some-but-not-all dialects even within SqlAlchemy. Databricks, for example, caused a bunch of problems back in the day, but we were able to work around that.
More broadly, you'll have to be cautious about how those bind parameters get specified and rendered in the final query text. The rendering is probably not the same for every dialect within SqlAlchemy (and our Redshift shim might cause weird issues, although I'd expect it to work). It also may not be the same for SqlAlchemy vs JDBC/DBAPI/ArrowFlightSQL/etc. - engines do have divergence in input and output formats between their SqlAlchemy implementations and their direct-access APIs.
| """ | ||
| start = time.perf_counter() | ||
|
|
||
| if sql_bind_parameter_set.param_dict: |
There was a problem hiding this comment.
Nothing about bind parameters is straightforward. 😛 But given that, this client shouldn't be a major source of problems for you.
We use sa_text() everywhere, so you can pass bind parameters through to the execute() call. Within a test context this shouldn't be too difficult to get working. The issue is we have to render the inline parameter using the SqlAlchemy format, which works for some-but-not-all dialects even within SqlAlchemy. Databricks, for example, caused a bunch of problems back in the day, but we were able to work around that.
More broadly, you'll have to be cautious about how those bind parameters get specified and rendered in the final query text. The rendering is probably not the same for every dialect within SqlAlchemy (and our Redshift shim might cause weird issues, although I'd expect it to work). It also may not be the same for SqlAlchemy vs JDBC/DBAPI/ArrowFlightSQL/etc. - engines do have divergence in input and output formats between their SqlAlchemy implementations and their direct-access APIs.
The engine tests currently execute via calls through dbt adapters maanged by an AdapterBackedSqlClient instance (and its corresponding DDL-enabled class). In order to allow for ongoing development and testing in a world where dbt-core depends on metricflow we need to remove the pass-through dependency on dbt-core that our reliance on dbt-adapters imposes on us. This is the first step in removing metricflow's test package dependencies on dbt adapters - a SQLAlchemy-based client that has many of the same test configuration advantages of the AdapterBackedSqlClient without the dbt dependencies. The base functionality is in place, and all tests run against all engines pass (and all environments build), but the client itself is not currently in use and as such there may be runtime flaws that we have not yet detected. This is not totally desirable, but I'm too lazy to adjust the robot's approach to avoid what amoutns to a dead code commit. Subsequent changes will enable this client across all of the engines, and we will fix up any issues we uncover as we go.
This removes the dbt-duckdb dependency from the base dev-env and moves all tests over to use the SqlAlchemySqlClient instead of the AdapterBackedSqlCLient. It includes one necessary update to error messaging snapshots, as the dbt adapters use a custom exception wrapper that does dbt-specific formatting. With this we are ready to proceed with moving over engine-specific clients.
This change moves Trino over with the minimum required updates for making things work. In particular: 1. Trino uses a catalog rather than a database, and in their sqlalchemy dialect they override the standard URL class to always be a string url. Since that was a nuisance I chose to simply overload the database element in the standard SqlAlchemy URL format and URL class to be the catalog for the Trino case. This works in our tests, and probably works in general, but it might not be robust in all scenarios. 2. For some reason semi-colons in sql expressions were causing syntax errors with the sqlalchemy Trino client. I guess the dbt-trino adapter (or the base dbt adapter) was removing trailing semicolons on execute.
This includes a bug fix for what turned out to be a dead code branch, which the dbt adapter integration did not use but we do. Now the httppath parameter is parsed correctly and we can use SqlAlchemy with Databricks.
Snowflake has some strange ideas about how to render case-insensitive database object names, so historically we've had these capitalized names in our snapshots. Snowflake's SQLAlchemy connector hews to the SQLAlchemy standard of rendering case-insensitive object names in lower-case, so this change updates our snapshots accordingly.
Making Redshift work with SqlAlchemy was a bit trickier than expected, as the "official" SqlAlchemy client for Redshift hasn't been updated in 2 years and does not work with SqlAlchemy 2.x. As a hack around this problem, this commit adds a bare-bones custom dialect override for psycopg2 that clobbers some class property to allow for backwards compatibility with Redshift. Since we only ever pass in sql query strings as sa_text() and always qualify our relation names with the schema inline in the queries this works, but if we ever need to relax those constraints we'll either need to find a more full-featured dialect for Redshift or else switch to using the redshift-connector package directly.
BigQuery has some special connection requirements, and does not support the EXPLAIN keyword, so the dry run configuration is also special. This updates the client accordingly.
648f98f to
5a5ef7e
Compare

Add SQLAlchemy-based SqlClient implementations for metricflow tests
The engine tests currently execute via calls through dbt adapters
maanged by an AdapterBackedSqlClient instance (and its corresponding
DDL-enabled class).
In order to allow for ongoing development and testing in a world where
dbt-core depends on metricflow we need to remove the pass-through
dependency on dbt-core that our reliance on dbt-adapters imposes on
us.
This is the first step in removing metricflow's test package dependencies
on dbt adapters - moving test execution to a SQLAlchemy-based client
that has many of the same test configuration advantages of the
AdapterBackedSqlClient without the dbt dependencies.
The full changeset associated with this (PR #1966) includes the addition
of the client plus independent changes to cut over each engine, as follows:
This includes one necessary update to error messaging snapshots, as the dbt
adapters use a custom exception wrapper that does dbt-specific formatting.
No special changes required.
Trino uses a catalog rather than a database, and in their
sqlalchemy dialect they override the standard URL class to always
be a string url. Since that was a nuisance I chose to simply overload
the database element in the standard SqlAlchemy URL format and URL class
to be the catalog for the Trino case. This works in our tests, and probably
works in general, but it might not be robust in all scenarios.
In addition, semi-colons in sql expressions were causing syntax
errors with the sqlalchemy Trino client, so these were removed from the
affected test cases.
This includes a bug fix for extracting the httppath parameter
from the engine URL. The original issue never emerged because it was in
a dead code branch, but the SqlAlchemy client uses it.
Snowflake's SQLAlchemy connector hews to the SQLAlchemy standard
of rendering case-insensitive object names in lower-case, so this
change updates our snapshots accordingly.
Making Redshift work with SqlAlchemy was a bit trickier than expected,
as the "official" SqlAlchemy client for Redshift hasn't been updated
in 2 years and does not work with SqlAlchemy 2.x.
As a hack around this problem, this change adds a bare-bones custom
dialect override for psycopg2 that clobbers some class property to
allow for backwards compatibility with Redshift. Since we only ever
pass in sql query strings as sa_text() and always qualify our
relation names with the schema inline in the queries this works, but
if we ever need to relax those constraints we'll either need to find
a more full-featured dialect for Redshift or else switch to using the
redshift-connector package directly.
BigQuery has some special connection requirements, and does not support
the EXPLAIN keyword, so the dry run configuration is also special.
This updates the client accordingly.