-
Notifications
You must be signed in to change notification settings - Fork 3.2k
feat(ingest/unity): Use sql to extract query history for usage #14953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
self.config.start_time, self.config.end_time | ||
) | ||
else: | ||
raise ValueError(f"Unsupported usage_data_source: {usage_data_source}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this caught automatically by pydantic
? If not - we could check it already in the validator.
Although this leaves us with the question - what to do in that else
statement, maybe we could rewrite the check:
if SYSTEM_TABLES:
elif API
else: # assume AUTO (can add an assertion on that)
So to not mixup different layers of business logic (parameter validation and execution logic.
|
||
|
||
def test_usage_data_source_system_tables_requires_warehouse_id(): | ||
"""Test that usage_data_source=SYSTEM_TABLES requires warehouse_id.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should be adding such comments, do you find them helpful? Seems like they repeat what is already stated in the name of the test method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the completeness of those tests. Maybe we should consider packing them in a single class (not a strong opinion). Considering that the new default will be to use system tables now, I wonder why other tests are not affected, is it because we never set warehouse_id in the tests so system tables functions are not used?
Could we have one (or more) end-2-end test added to see logic in get_query_history_via_system_tables
in action and somehow try to assess it's working (it is not fully possible of course, without having live Snowflake instance to query, just to the extent that unit tests allow us)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly, we never set warehouse id
try: | ||
rows = self._execute_sql_query(query, (start_time, end_time)) | ||
for row in rows: | ||
try: | ||
yield Query( | ||
query_id=row.statement_id if row.statement_id else None, | ||
query_text=row.statement_text, | ||
statement_type=( | ||
QueryStatementType(row.statement_type) | ||
if row.statement_type | ||
else None | ||
), | ||
start_time=row.start_time, | ||
end_time=row.end_time, | ||
user_id=row.executed_by_user_id, | ||
user_name=row.executed_by if row.executed_by else None, | ||
executed_as_user_id=( | ||
row.executed_as_user_id if row.executed_as_user_id else None | ||
), | ||
executed_as_user_name=( | ||
row.executed_as if row.executed_as else None | ||
), | ||
) | ||
except Exception as e: | ||
logger.warning(f"Error parsing query from system table: {e}") | ||
self.report.report_warning("query-parse-system-table", str(e)) | ||
except Exception as e: | ||
logger.error( | ||
f"Error fetching query history from system tables: {e}", exc_info=True | ||
) | ||
self.report.report_warning("get-query-history-system-tables", str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments here:
- let's use proper structured report with
title
,message
context
, ... - I miss some metrics about number of results or execution time of the query... unless that's already done upstream in the call stack
- the recurrent pattern
x if x else None
... isn't that redundant and innecessary?
logger.error( | ||
f"Error fetching query history from system tables: {e}", exc_info=True | ||
) | ||
self.report.report_warning("get-query-history-system-tables", str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider using failure here, not a strong opinion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with some minor comments
What I really miss is some test on the new feature: fetching from system tables, how are you planning to cover that?
d185b34
to
9f5ad21
Compare
title="Failed to fetch query history from system tables", | ||
message=f"Error querying system.query.history table: {e}", | ||
context=f"Query period: {start_time} to {end_time}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title
and message
are used for grouping logs
title="Failed to fetch query history from system tables", | |
message=f"Error querying system.query.history table: {e}", | |
context=f"Query period: {start_time} to {end_time}", | |
title="Failed to fetch query history from system tables", | |
message=f"Error querying system.query.history table", | |
context=f"Query period: {start_time} to {end_time}", | |
exc=e, |
No description provided.