Optimize Table Reflection Performance for Large Schemas #619

ranophoenix · 2025-11-13T19:49:10Z

Optimize Table Reflection Performance for Large Schemas

Problem Statement

The current implementation of get_columns() in the Snowflake SQLAlchemy dialect has a performance issue when working with schemas containing thousands of tables. The dialect unconditionally attempts to prefetch and cache column metadata for all tables in the schema via information_schema.columns, even when only reflecting a single table. This causes:

Significant delays when reflecting individual tables in large schemas
Unnecessary network overhead and query execution time
Query failures when schemas are extremely large (error 90030: "Information schema query returned too much data")
Wasted processing time executing expensive queries that ultimately fail and fall back to the granular approach anyway
Poor user experience for common use cases (reflecting one or a few tables)

The single-table query fallback (using DESC TABLE) only triggers after the expensive schema-wide query fails, which means users experience slow performance (or timeouts) before the fallback even kicks in.

Solution

This PR changes the default behavior to query individual tables directly, with opt-in schema-wide caching through the cache_column_metadata connection parameter:

When cache_column_metadata=False (new default):

Queries only the specific requested table using DESC TABLE
Avoids the expensive information_schema.columns query entirely
Dramatically improves performance for single-table reflection in large schemas
No risk of hitting the "too much data" error

When cache_column_metadata=True (opt-in):

Attempts to prefetch all schema columns via information_schema.columns
Only beneficial for small-to-medium schemas where the query succeeds
Can help amortize costs when reflecting multiple tables sequentially
Will still fall back to individual queries if the schema is too large (error 90030)
Not recommended for large schemas as it wastes time on a query that will ultimately fail

Changes

Modified get_columns() method to check _cache_column_metadata flag before calling _get_schema_columns()
Changed default behavior from attempting schema-wide caching to querying individual tables
Fixed identifier normalization bug in _StructuredTypeInfoManager.get_table_columns() - changed from denormalize_name() to normalize_name() for schema and table names to properly match Snowflake's identifier handling
Fixed name_utils.normalize_name() for consistent identifier quoting
Fixed BINARY type metadata inconsistency - normalized BINARY column length to None in both code paths (DESC TABLE vs information_schema) to ensure identical column metadata regardless of caching setting
Fixed identity column metadata - changed order_type to order key to match the required argument name for sqlalchemy.sql.schema.Identity constructor
Added comprehensive test coverage with parameterized tests verifying both behaviors
Removed outdated deprecation notice from README as the flag is now functional with proper behavior control

Performance Impact

For schemas with a large number of tables/columns, this change dramatically improves table reflection performance when reflecting a single table, as it eliminates the costly (and often failing) query that fetches metadata for all tables.

Before (schema-wide caching always attempted):

Attempt to query all columns in schema → wait → potentially fail with error 90030 → fall back to DESC TABLE
Result: Significant delays or timeouts

After (cache_column_metadata=False, new default):

Query only the requested table with DESC TABLE
Result: Dramatically faster reflection times

Backward Compatibility

✅ Restores original default behavior - Returns to cache_column_metadata=False as the default (matching the original implementation)
⚠️ Change from recent versions - Recent versions effectively forced schema-wide caching (the deprecated behavior); this PR makes it opt-in again
✅ Opt-in to schema caching - Users can enable schema-wide caching by setting cache_column_metadata=True
✅ Automatic fallback preserved - Even with cache_column_metadata=True, the fallback mechanism still works for schemas that are too large
✅ Bug fixes improve reliability - The identifier normalization and BINARY type fixes ensure consistent behavior across both caching modes

Usage

# New default behavior (recommended for most cases)
engine = create_engine('snowflake://...')  # cache_column_metadata=False by default

# Opt-in to schema-wide caching (only for small-to-medium schemas)
engine = create_engine('snowflake://...?cache_column_metadata=true')

Testing

Added test_cache_column_metadata.py with parameterized tests that verify:

With cache_column_metadata=False: Only 1 DESC call for the requested table (optimal)
With cache_column_metadata=True: 1 schema query + additional DESC calls for structured types
Proper handling of OBJECT, ARRAY, and MAP column types in both modes
Consistent column metadata regardless of caching mode

Related Issue

Addresses Snowflake Support Case #01168351 regarding performance issues with table reflection in large schemas.

github-actions · 2025-11-13T19:49:22Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

ranophoenix · 2025-11-13T19:55:14Z

I have read the CLA Document and I hereby sign the CLA

ranophoenix requested a review from a team as a code owner November 13, 2025 19:49

ranophoenix force-pushed the support_case_01168351 branch from 6629361 to 4613797 Compare November 13, 2025 20:51

Fix cache_column_metadata behavior and identifier normalization

de862cb

ranophoenix force-pushed the support_case_01168351 branch from 4613797 to de862cb Compare November 13, 2025 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Table Reflection Performance for Large Schemas #619

Optimize Table Reflection Performance for Large Schemas #619

Uh oh!

ranophoenix commented Nov 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

ranophoenix commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize Table Reflection Performance for Large Schemas #619

Are you sure you want to change the base?

Optimize Table Reflection Performance for Large Schemas #619

Uh oh!

Conversation

ranophoenix commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!