[Feature] Private PyPI packages in Python UDFs

### Is this your first time submitting a feature request?

- [x] I have read the [expectations for open source contributors](https://docs.getdbt.com/docs/contributing/oss-expectations)
- [x] I have searched the existing issues, and I could not find an existing issue for this feature
- [x] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

### Describe the feature

# Support Private PyPI packages for Python UDFs

While #12041 adds support for specifying public PyPI packages in Python UDFs via the packages config, many organizations rely on private PyPI repositories (e.g., Artifactory, AWS CodeArtifact, GCP Artifact Registry, Azure Artifacts, Nexus, Gemfury) to host internal Python packages. There is currently no way to configure dbt to authenticate against a private package index when resolving UDF packages.

Additionally, warehouses vary in how they accept custom/private code — Snowflake supports uploading .zip files via stages, BigQuery supports importing .py files from Cloud Storage — but neither supports authenticating against a private PyPI index directly. dbt could bridge this gap by resolving packages from a private index and preparing them in the format each warehouse requires.

## Motivation

  - Teams building internal Python libraries (e.g., shared feature engineering, custom ML models, proprietary transforms) cannot use them in dbt Python UDFs today.
  - Warehouse-native package support is limited to public PyPI:
    - Snowflake: Supports public PyPI via ARTIFACT_REPOSITORY + PACKAGES, but docs explicitly state "Access to private repositories is not supported." Custom code can be uploaded as .zip files to Snowflake stages and referenced via IMPORTS.
    - BigQuery: Supports public PyPI via `OPTIONS(packages=["..."])` (wheels only). Custom Python files can be imported from GCS via `OPTIONS(library=["gs://..."])`, but this is limited to `.py` files, not full package archives.
  - A dbt-native solution would provide a consistent, cross-adapter experience for private package consumption.

## Proposed design

### 1. New package-indexes configuration (project-level)

Allow users to configure one or more private PyPI indexes in dbt_project.yml:

```yaml
  package-indexes:
    - name: internal
      url: https://pypi.internal.company.com/simple/
      auth:
        type: token  # or "basic", "env"
        token: "{{ env_var('PRIVATE_PYPI_TOKEN') }}"
    - name: aws-codeartifact
      url: https://my-domain-123456789.d.codeartifact.us-east-1.amazonaws.com/pypi/my-repo/simple/
      auth:
        type: basic
        username: "{{ env_var('CODEARTIFACT_USER') }}"
        password: "{{ env_var('CODEARTIFACT_TOKEN') }}"
```

> auth.type supports common authentication patterns: token (bearer/API token), basic (username + password), and env (fully custom, e.g., for AWS_SESSION_TOKEN).

### 2. Extended packages config on UDFs/models

Allow package specs to reference a named index:

```yaml
  functions:
    - name: my_function
      config:
        packages:
          - scikit-learn  # resolved from public PyPI (default)
          - name: my-internal-lib
            version: ">=1.2.0"
            index: internal  # references the named index above
          - name: another-lib
            index: aws-codeartifact
```

The simple string form (`- scikit-learn`) remains supported for backward compatibility with public PyPI.

### 3. Artifact staging configuration (adapter-specific)

Since warehouses require pre-built artifacts to be uploaded to warehouse-specific storage, users need to configure where dbt stages these artifacts and how to authenticate. This would live in `profiles.yml` alongside existing adapter connection config:

Snowflake:
```yaml
  my_snowflake_profile:
    target: dev
    outputs:
      dev:
        type: snowflake
        # ... existing connection config ...
        python_package_staging:
          stage: my_db.my_schema.python_packages  # Snowflake stage for .zip uploads
```

BigQuery:
```yaml
  my_bq_profile:
    target: dev
    outputs:
      dev:
        type: bigquery
        # ... existing connection config ...
        python_package_staging:
          gcs_bucket: my-project-dbt-packages      # GCS bucket for .py/.whl files
          gcs_prefix: udf-deps/                     # optional prefix/folder
          credentials: GCS_SA_KEY_PATH      # if different from the BQ connection creds
```

## Key design decisions:
  - Dependency Staging config lives in `profiles.yml`, not `dbt_project.yml`, because it contains environment-specific settings and potential credentials. This is consistent with the mechanism other connection/infra config is handled.
  - Each adapter defines its own staging schema since the storage mechanism is fundamentally different per warehouse.
  - Credentials for artifact storage may differ from the main warehouse connection. e.g., a service account with GCS write access that is separate from the BigQuery query credentials. Adapters should support an optional credential override, falling back to the main connection credentials by default.

## End-to-end flow

For a private package on a warehouse that doesn't support private PyPI natively:

  1. Resolve: dbt authenticates against the private index, resolves the package + transitive dependencies.
  2. Download: dbt downloads the resolved artifacts locally.
  3. Transform: dbt converts/repackages artifacts into the format the warehouse requires (.zip for Snowflake, .py for BigQuery, etc.).
  4. Upload: dbt uploads to the configured staging location (Snowflake stage, GCS bucket, DBFS path).
  5. Reference: The adapter generates the CREATE FUNCTION statement referencing the staged artifacts (IMPORTS for Snowflake, library for BigQuery, etc.).


## Extras
1. Since the index credentials are stored in `profiles.yml` , `dbt debug` should validate the connection to the private repository
2. We might also need to validate if the dependencies actually exist in the private repository as a pre-validation mechanism

## Acceptance criteria

  - Users can configure one or more private PyPI indexes in `dbt_project.yml` with credential references
  - The packages config on functions/models supports specifying which index to resolve from
  - Users can configure warehouse-specific artifact staging locations in `profiles.yml`
  - Adapters can authenticate against staging storage (with optional credential override)
  - Credentials are never stored in plain text — only env var references
  - Public PyPI remains the default when no index is specified (backward compatible)
  - Documentation covers setup for common private registries (Artifactory, CodeArtifact, GCP Artifact Registry) and staging configuration per warehouse

## Implementation considerations

  - dbt-core scope: Package index configuration, credential management, resolution/download logic,
  the extended packages schema, and a standard interface for adapters to receive resolved artifacts.
  - dbt-adapters scope: Warehouse-specific artifact transformation, upload to staging storage, and `CREATE FUNCTION` statement generation. See dbt-adapters#1651 for the adapter-side discussion.
  - Dependency resolution: When downloading for artifact-based delivery, dbt would need to resolve transitive dependencies. We should consider leveraging pip download or a library like resolvelib rather than reimplementing resolution.
  - Caching: Downloaded and transformed artifacts should be cached locally to avoid re-downloading
  on every run. We could consider a content-addressed cache keyed on package name + version + platform.

### Describe alternatives you've considered

I could do all of this manually and it will work. However, `dbt` providing a standard interface for this across adapters will be a great ease-of-use feature

### Who will this benefit?

Anyone who needs to use private PyPI packages in their python UDFs.

### Are you interested in contributing this feature?

YES

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Private PyPI packages in Python UDFs #12655

Is this your first time submitting a feature request?

Describe the feature

Support Private PyPI packages for Python UDFs

Motivation

Proposed design

1. New package-indexes configuration (project-level)

2. Extended packages config on UDFs/models

3. Artifact staging configuration (adapter-specific)

Key design decisions:

End-to-end flow

Extras

Acceptance criteria

Implementation considerations

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Private PyPI packages in Python UDFs #12655

Description

Is this your first time submitting a feature request?

Describe the feature

Support Private PyPI packages for Python UDFs

Motivation

Proposed design

1. New package-indexes configuration (project-level)

2. Extended packages config on UDFs/models

3. Artifact staging configuration (adapter-specific)

Key design decisions:

End-to-end flow

Extras

Acceptance criteria

Implementation considerations

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions