feat: add ADC authentication#72
Open
yuriolive wants to merge 9 commits into
Open
Conversation
- Make google_application_credentials optional in tap config - Resolve credentials in order: config, GOOGLE_APPLICATION_CREDENTIALS_STRING, GOOGLE_APPLICATION_CREDENTIALS path, then ADC (workload identity/GKE/Airflow) - Add _get_bigquery_client() in client.py with env and default() fallback - Connector create_engine: check env vars when config has no credentials - Document credential resolution and ADC in README Co-authored-by: Cursor <cursoragent@cursor.com>
- Add return type annotation to _get_bigquery_client() - Add debug logging when JSON parsing fails (instead of silent pass) - Change connector log level from warning to debug for normal fallback - Document why json_serializer/deserializer are not needed Co-authored-by: Cursor <cursoragent@cursor.com>
feat: add Application Default Credentials (ADC) support
…talog discovery - Introduced a static method `_normalize_table_name` to strip duplicated schema prefixes from table names. - Updated `discover_catalog_entries` to utilize the new normalization method for consistent table name handling. - Adjusted `get_object_names` to call the normalization method, ensuring uniformity in table name representation. - Added capabilities to the `TapBigQuery` class for improved functionality. Co-authored-by: Cursor <cursoragent@cursor.com>
feat: enhance BigQuery connector with table name normalization and ca…
…y in BigQueryConnector. Added a static method to strip schema prefixes from table names and updated the discover_catalog_entries method to utilize this normalization for consistent table name handling.
- Added a static method `_ensure_selected_by_default` to enforce stream-level selected-by-default metadata on discovered entries. - Updated `discover_catalog_entries` to include logging of discovered streams and ensure default selection for new entries.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request improves the authentication flexibility and metadata discovery reliability of
tap-bigquery. The primary focus is aligning credential resolution with standard Google Cloud patterns (Application Default Credentials) and fixing issues where table names were being incorrectly prefixed during catalog discovery.Key Changes
1. Robust Credential Resolution (ADC Support)
Introduced a centralized
_get_bigquery_clientutility and updatedBigQueryConnector.create_engineto follow a standard resolution order. This allows the tap to work seamlessly in environments like GKE, Cloud Run, or Airflow using Workload Identity without requiring explicit configuration.Resolution Order:
google_application_credentials(can be a JSON string, a Python dict, or a file path).GOOGLE_APPLICATION_CREDENTIALS_STRING(JSON string).GOOGLE_APPLICATION_CREDENTIALS(Path to file).2. Improved Table Discovery & Normalization
discover_catalog_entries. This prevents the SQLAlchemy dialect from incorrectly resolving dataset names as project IDs during bulk operations._normalize_table_nameto consistently strip dataset/schema prefixes from table names. This ensures thattap_stream_idand table identifiers remain clean and consistent.selected-by-defaultin the metadata to improve the "out-of-the-box" experience for users.3. Schema & Configuration Updates
google_application_credentialsconfig property is no longer marked asrequired=True, as the tap can now rely on environment variables or ADC.README.mdto clearly outline the new authentication flow.Technical Details
tap_bigquery/client.py: Refactored the client creation logic into a standalone function used by the stream's cached property. Added support for passingproject_idto the BigQuery client to ensure correct billing and scoping.tap_bigquery/connector.py:create_engineto supportcredentials_pathfor file-based auth via SQLAlchemy.discover_catalog_entriesand_ensure_selected_by_default.tap_bigquery/tap.py: Updated JSON schema forgoogle_application_credentialsto reflect its optional status and added a more descriptive explanation for users.How to Test
gcloud auth application-default loginactive and nogoogle_application_credentialsin the config.config.json.tap-bigquery --discoverand verify thattap_stream_idvalues do not contain redundant dataset prefixes (e.g.,my_dataset.my_tableshould bemy_tableif the stream ID is already prefixed by schema).