Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
48c5b78
Beginnings of an OAUTH login for Superset
martyngigg Oct 2, 2025
b423d24
Start configuring TLS for local setup
martyngigg Oct 2, 2025
e7227a7
Checkpoint trino TLS
martyngigg Oct 6, 2025
a4a1ecb
Create trino catalog through https with auth
martyngigg Oct 6, 2025
cd149b8
Switch to analytics.localdev hostname
martyngigg Oct 7, 2025
151378d
Use https everywhere
martyngigg Oct 7, 2025
c438aee
OAuth config for superset
martyngigg Oct 7, 2025
c766091
Add openfga as Lakekeeper AUTHZ backend
martyngigg Oct 7, 2025
6f60d0e
Avoid recreating the certificate if it exists
martyngigg Oct 8, 2025
d12b0e4
Use docker compose array-style entrypoint/command for scripts
martyngigg Oct 8, 2025
10db8d0
Assign Lakekeeper admin/project_admin to adpsuperuser
martyngigg Oct 9, 2025
9e7e634
Move Lakekeeper role to new cli style bootstrap script
martyngigg Oct 9, 2025
1a52a07
Begin confuring oauth2/opa access for Trino
martyngigg Oct 9, 2025
4c179b0
Begin using kcadm to bootstrap Keycloak.
martyngigg Oct 10, 2025
23b9196
Finish bootstrapping Keycloak
martyngigg Oct 13, 2025
80f7b6b
Switch back to Traefik as an entrypoint for everything
martyngigg Oct 13, 2025
e6d8a91
Login via trino cli nearly working
martyngigg Oct 13, 2025
feb4347
Login via trino cli works with --user=<oid>
martyngigg Oct 13, 2025
2d79f49
Supply user names for Lakekeeper initial admin
martyngigg Oct 14, 2025
9fe0614
Add TLS endpoints for lakekeeper/keycloak
martyngigg Oct 14, 2025
45b1427
Create a non-default Lakekeeper project
martyngigg Oct 14, 2025
a89f9c2
Disable default Lakekeeper project
martyngigg Oct 14, 2025
1e16a1c
WIP: Trino oauth & some variable renaming.
martyngigg Oct 16, 2025
1ce90c0
Output user details for new Keycloak users
martyngigg Oct 17, 2025
5925d29
Use https for keycloak ralm in trino catalog
martyngigg Oct 18, 2025
1140dfe
Fix web entrypoint for lakekeeper
martyngigg Oct 18, 2025
7a7de87
Add opa client in keycloak
martyngigg Oct 18, 2025
232d887
Update Open Policy Agent to support Lakekeeper project ID
martyngigg Oct 18, 2025
bf27b44
Trino access to Iceberg works
martyngigg Oct 18, 2025
3e4ae02
Login and execute sql queries in Trino successful.
martyngigg Oct 20, 2025
7c760a4
Ignore bootstrap logs in docker local setup
martyngigg Oct 20, 2025
1c4b359
Clarify readme for local/docker-compose.yml
martyngigg Oct 20, 2025
79780cc
Login to Superset via OAuth
martyngigg Oct 21, 2025
bd97039
Create a superset user for REST api logins.
martyngigg Oct 22, 2025
1d07510
Switch to password & jwt auth in Trino.
martyngigg Oct 22, 2025
94fc020
Assign admin permissions to human user
martyngigg Nov 11, 2025
006eaec
Add route for OpenFGA
martyngigg Nov 11, 2025
4364ed7
Migrate away from https
martyngigg Dec 6, 2025
8b8046f
Move back to http for everything except Trino
martyngigg Dec 8, 2025
119db22
Update readme for local development
martyngigg Dec 8, 2025
4afe041
Configure OPA to allow all access
martyngigg Dec 8, 2025
fc13421
Update e2e tests config for new local dev configuration
martyngigg Dec 8, 2025
139c1e6
Merge remote-tracking branch 'origin/main' into catalog-access-control
martyngigg Dec 8, 2025
361cf18
Revert part of bad merge
martyngigg Dec 8, 2025
cca72ed
Support project id in iceberg dlt destination.
martyngigg Dec 8, 2025
3089183
E2E tests for pyiceberg destination passing
martyngigg Dec 8, 2025
516aedc
Remove OPA for now until we understand what is doing what.
martyngigg Dec 16, 2025
c43f8b0
Rename commandline argument.
martyngigg Dec 16, 2025
79726d6
Remove unnecessary https schenanigans for local dev
martyngigg Dec 16, 2025
fbeba93
Merge remote-tracking branch 'origin/main' into catalog-access-control
martyngigg Dec 17, 2025
419cb14
Enable both static/dynamic Trino catalogs
martyngigg Dec 17, 2025
9aaa002
Update readmes on running local services and elt-common tests.
martyngigg Dec 18, 2025
c187907
Add more links and descriptions in the readme about working with the …
martyngigg Dec 18, 2025
f7dc672
Add project_id to dlt secrets configuration example
martyngigg Dec 18, 2025
ddc15b2
Fix ansible tasks for Lakekeeper after refactor
martyngigg Dec 18, 2025
1217dba
Fix hostname e2e CI tests
martyngigg Dec 18, 2025
67a44ed
Minor comments from coderabbit
martyngigg Dec 18, 2025
dcb3825
Fix f-string interpolation
martyngigg Dec 18, 2025
443f9ef
Use in-memory trino catalog store for CI
martyngigg Dec 18, 2025
c328393
Upgrade to GHA action versions
martyngigg Dec 18, 2025
09bc21d
Remove readme for local Keycloak now that bootstrap is used
martyngigg Dec 18, 2025
88219aa
Clarify setup is one-time thing
martyngigg Dec 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 2 additions & 13 deletions .github/actions/run-pytest-with-uv/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ runs:
using: composite
steps:
- name: Install uv
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v7
with:
activate-environment: false
cache-dependency-glob: ${{ inputs.uv-cache-dependency-glob }}
version: ${{ inputs.uv-version }}
python-version: ${{ inputs.python-version }}
Expand All @@ -43,23 +44,11 @@ runs:
run: uv sync --locked --all-extras --dev
working-directory: ${{ inputs.pyproject-directory }}

- name: Add minio to /etc/hosts
if: inputs.compose-file-path != ''
shell: bash -l {0}
run: |
echo "127.0.0.1 minio" | sudo tee -a /etc/hosts

- name: Run tests
shell: bash -l {0}
run: uv run pytest --durations-min=0.5 --exitfirst "${{ inputs.pytest-file-or-dir }}" --cache-clear
working-directory: ${{ inputs.pyproject-directory }}

- name: Remove minio from /etc/hosts
if: inputs.compose-file-path != ''
shell: bash -l {0}
run: |
sudo sed -i -e '/minio/d' /etc/hosts

- name: Dump Docker Compose logs on failure
if: failure() && inputs.compose-file-path != ''
shell: bash -l {0}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/ci-static.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ jobs:
name: static checks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
- name: "Install prek"
run: >
python -m pip install prek
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-superset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Compute fully qualified image name
run: echo "FQ_IMAGE_NAME=${{ env.REGISTRY }}/${{ env.ORG_NAME }}/${{ env.IMAGE_NAME }}" >> $GITHUB_ENV
- name: Log in to the Container registry
Expand Down
15 changes: 14 additions & 1 deletion .github/workflows/elt-common_e2e_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,22 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.run_id }}
cancel-in-progress: true

env:
TRINO_CATALOG_STORE: memory

jobs:
test:
name: elt-common end-to-end tests
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6

- name: Add adp-router to /etc/hosts
shell: bash -l {0}
run: |
echo "127.0.0.1 adp-router" | sudo tee -a /etc/hosts

- name: Run end-to-end tests
uses: ./.github/actions/run-pytest-with-uv
Expand All @@ -36,3 +44,8 @@ jobs:
pyproject-directory: elt-common
pytest-file-or-dir: tests/e2e_tests
uv-cache-dependency-glob: elt-common/pyproject.toml

- name: Remove adp-router from /etc/hosts
shell: bash -l {0}
run: |
sudo sed -i -e '/adp-router/d' /etc/hosts
2 changes: 1 addition & 1 deletion .github/workflows/elt-common_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6

- name: Run unit tests
uses: ./.github/actions/run-pytest-with-uv
Expand Down
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ uv run pytest unit_tests

```bash
pushd infra/local; docker compose up -d; popd # start the services
echo "127.0.0.1 adp-router" | sudo tee -a /etc/hosts # edit /etc/hosts
cd elt-common/
uv run pytest e2e_tests
pushd infra/local; docker compose down -v; popd # stop the services
Expand Down
52 changes: 15 additions & 37 deletions elt-common/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,55 +13,33 @@ Development requires the following tools:
### Setting up a Python virtual environment

Once `uv` is installed, create an environment and install the `elt-common`
package in editable mode using the following command:
package in editable mode, along with the development dependencies:

```bash
> uv venv
> source .venv/bin/activate
> uv pip install --editable . --group dev
```

## Running end-to-end tests

The end-to-end (e2e) tests for the `pyiceberg` destination require a running Iceberg
rest catalog to test complete functionality.
A (`docker-compose`)[./tests/docker-compose.yml] file is provided to both run the
required services and provide a `python-uv` service for executing test commands.
Please ensure you have docker and docker compose available on your command line
before continuing.

To run the end-to-end tests, from this directory execute

```bash
> docker compose -f tests/docker-compose.yml run python-uv uv run pytest tests/e2e_tests
```
## Running unit tests

When you have finished running the tests run
Run the unit tests using `pytest`:

```bash
> docker compose -f tests/docker-compose.yml down
> pytest tests/unit_tests
```

to bring down the dependent services.

### Debugging

Using a debugger to debug the end-to-end tests is more complicated as it requires
the dependent services to be accessible using their service names from within
the compose file.

To workaround this the `/etc/hosts` file can be edited to map the service names
to localhost (127.0.0.1). Open `/etc/hosts` and add
## Running end-to-end tests

```text
# docker compose services
127.0.0.1 minio
127.0.0.1 keycloak
127.0.0.1 lakekeeper
```
The end-to-end (e2e) tests for the `pyiceberg` destination require a running Iceberg
rest catalog to test complete functionality.
The local, docker-compose-based configuration provided by
[infra/local/docker-compose.yml](../infra/local/docker-compose.yml) is the easiest way to
spin up a set of services compatible with running the tests.
_Note the requirement to edit `/etc/hosts` described in [here](../infra/local/README.md)._

Now bring up the services:
Once the compose services are running, execute the e2e tests using `pytest`:

```bash
> docker compose -f tests/docker-compose.yml up -d
> pytest tests/e2e_tests
```

and start your debugger as normal.
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
@configspec(init=False)
class PyIcebergRestCatalogCredentials(CredentialsConfiguration):
uri: str = None # type: ignore
project_id: Optional[str] = None
warehouse: Optional[str] = None
access_delegation: TPyIcebergAccessDelegation = "vended-credentials"
oauth2_server_uri: Optional[str] = None # This is the endpoint to use to retrieve a token
Expand All @@ -29,8 +30,9 @@ def as_dict(self) -> Dict[str, str]:
properties = {"credential": self.client_credential()} if self.client_id else {}

field_aliases: Dict[str, str] = {
"access_delegation": f"{CATALOG_HEADER_PREFIX}X-Iceberg-Access-Delegation",
"access_delegation": f"{CATALOG_HEADER_PREFIX}x-iceberg-access-delegation",
"oauth2_server_uri": "oauth2-server-uri",
"project_id": f"{CATALOG_HEADER_PREFIX}x-project-id",
}
skip_fields = ("client_id", "client_secret")
properties.update(
Expand Down
1 change: 1 addition & 0 deletions elt-common/src/elt_common/iceberg/trino.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,5 +109,6 @@ def _create_engine(self, credentials: TrinoCredentials) -> Engine:
connect_args={
"auth": BasicAuthentication(credentials.user, credentials.password),
"http_scheme": credentials.http_scheme,
"verify": False,
},
)
66 changes: 27 additions & 39 deletions elt-common/tests/e2e_tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from contextlib import contextmanager
import dataclasses
from pathlib import Path
import time
from typing import Any, Callable, Dict, Generator, List
import urllib.parse
import uuid
Expand Down Expand Up @@ -28,7 +29,7 @@

_RETRY_ARGS = {
"wait": tenacity.wait_exponential(max=10),
"stop": tenacity.stop_after_attempt(10),
"stop": tenacity.stop_after_attempt(5),
"reraise": True,
}

Expand Down Expand Up @@ -76,33 +77,36 @@ class Settings(BaseSettings):
# The default values assume the docker-compose.yml in the infra/local has been used.
# These are provided for the convenience of easily running a debugger without having
# to set up remote debugging
host_netloc: str = "localhost:58080"
docker_netloc: str = "traefik"
s3_access_key: str = "adpuser"
host_netloc: str = "localhost:50080"
docker_netloc: str = "adp-router:50080"
s3_access_key: str = "adpsuperuser"
s3_secret_key: str = "adppassword"
s3_bucket: str = "e2e-tests-warehouse"
s3_endpoint: str = "http://minio:59000"
s3_bucket: str = "e2e-tests"
s3_endpoint: str = "http://adp-router:59000"
s3_region: str = "local-01"
s3_path_style_access: bool = True
openid_client_id: str = "localinfra"
openid_client_id: str = "machine-infra"
openid_client_secret: str = "s3cr3t"
openid_scope: str = "lakekeeper"
project_id: str = "c4fcd44f-7ce7-4446-9f7c-dcc7ba76dd22"
warehouse_name: str = "e2e_tests"

# trino
trino_http_scheme: str = "http"
trino_http_scheme: str = "https"
trino_host: str = "localhost"
trino_port: str = "59088"
trino_user: str = "trino"
trino_password: str = ""
trino_port: str = "58443"
trino_user: str = "machine-infra"
trino_password: str = "s3cr3t"

@property
def lakekeeper_url(self) -> Endpoint:
return Endpoint(f"http://{self.host_netloc}/iceberg", self.docker_netloc)

@property
def openid_provider_uri(self) -> Endpoint:
return Endpoint(f"http://{self.host_netloc}/auth/realms/iceberg", self.docker_netloc)
return Endpoint(
f"http://{self.host_netloc}/auth/realms/analytics-data-platform", self.docker_netloc
)

def storage_config(self) -> Dict[str, Any]:
return {
Expand All @@ -129,26 +133,12 @@ def storage_config(self) -> Dict[str, Any]:


class Server:
"""Wraps a Lakekeeper instance. It is assumed that the instance is bootstrapped."""

def __init__(self, access_token: str, settings: Settings):
self.access_token = access_token
self.settings = settings

# Bootstrap server once
management_endpoint_v1 = self.management_endpoint(version=1)
server_info = self._request_with_auth(
requests.get,
url=management_endpoint_v1 + "/info",
)
server_info.raise_for_status()
server_info = server_info.json()
if not server_info["bootstrapped"]:
response = self._request_with_auth(
requests.post,
management_endpoint_v1 + "/bootstrap",
json={"accept-terms-of-use": True},
)
response.raise_for_status()

@property
def token_endpoint(self) -> Endpoint:
return self.settings.openid_provider_uri + "/protocol/openid-connect/token"
Expand All @@ -169,13 +159,11 @@ def management_endpoint(self, *, version: int | None = None) -> Endpoint:
def warehouse_endpoint(self, *, version: int = 1) -> Endpoint:
return self.management_endpoint(version=version) + "/warehouse"

def create_warehouse(
self, name: str, project_id: uuid.UUID, storage_config: dict
) -> "Warehouse":
def create_warehouse(self, name: str, project_id: str, storage_config: dict) -> "Warehouse":
"""Create a warehouse in this server"""

payload = {
"project-id": str(project_id),
"project-id": project_id,
**storage_config,
}

Expand Down Expand Up @@ -233,6 +221,7 @@ def connect(self) -> PyIcebergCatalog:
"""Connect to the warehouse in the catalog"""
creds = PyIcebergCatalogCredentials()
creds.uri = str(self.server.catalog_endpoint())
creds.project_id = self.server.settings.project_id
creds.warehouse = self.name
creds.oauth2_server_uri = str(self.server.token_endpoint)
creds.client_id = self.server.settings.openid_client_id
Expand Down Expand Up @@ -326,12 +315,7 @@ def server(access_token: str) -> Server:


@pytest.fixture(scope="session")
def project() -> uuid.UUID:
return uuid.UUID("{00000000-0000-0000-0000-000000000000}")


@pytest.fixture(scope="session")
def warehouse(server: Server, project: uuid.UUID) -> Generator:
def warehouse(server: Server) -> Generator:
if not settings.warehouse_name:
raise ValueError("Empty 'warehouse_name' is not allowed.")

Expand All @@ -349,7 +333,9 @@ def warehouse(server: Server, project: uuid.UUID) -> Generator:
minio_client.make_bucket(bucket_name=bucket_name)
print(f"Bucket {bucket_name} created.")

warehouse = server.create_warehouse(settings.warehouse_name, project, storage_config)
warehouse = server.create_warehouse(
settings.warehouse_name, server.settings.project_id, storage_config
)
print(f"Warehouse {warehouse.project_id} created.")
try:
yield warehouse
Expand All @@ -360,6 +346,8 @@ def _remove_bucket(bucket_name):
minio_client.remove_bucket(bucket_name=bucket_name)

try:
# Allow a brief pause for the test operations to complete
time.sleep(1)
server.purge_warehouse(warehouse)
server.delete_warehouse(warehouse)
_remove_bucket(bucket_name)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ def setup(self, warehouse: Warehouse) -> None:
os.environ["DESTINATION__PYICEBERG__CREDENTIALS__URI"] = str(
warehouse.server.catalog_endpoint()
)
os.environ["DESTINATION__PYICEBERG__CREDENTIALS__PROJECT_ID"] = str(
warehouse.server.settings.project_id
)
os.environ.setdefault("DESTINATION__PYICEBERG__CREDENTIALS__WAREHOUSE", warehouse.name)
os.environ.setdefault(
"DESTINATION__PYICEBERG__CREDENTIALS__OAUTH2_SERVER_URI",
Expand Down
6 changes: 3 additions & 3 deletions elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ def trino_engine(warehouse: Warehouse):
port=server_settings.trino_port,
user=server_settings.trino_user,
password=server_settings.trino_password,
catalog="",
http_scheme="http",
catalog=warehouse.name,
http_scheme="https",
)
# Use one connection to create the catalog
trino_catalog_creator = TrinoQueryEngine(creds)
trino_catalog_creator.execute(
f"""create catalog {warehouse.name} using iceberg
with (
"iceberg.catalog.type" = 'rest',
"iceberg.rest-catalog.warehouse" = '{warehouse.name}',
"iceberg.rest-catalog.warehouse" = '{warehouse.server.settings.project_id}/{warehouse.name}',
"iceberg.rest-catalog.uri" = '{server.catalog_endpoint().value(use_internal_netloc=True)}',
"iceberg.rest-catalog.vended-credentials-enabled" = 'false',
"iceberg.rest-catalog.security" = 'OAUTH2',
Expand Down
4 changes: 3 additions & 1 deletion infra/ansible-docker/group_vars/all/keycloak.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@
keycloak_http_port: 8080
keycloak_http_management_port: 9000
keycloak_base_path: /auth
keycloak_realm_url_isis: "https://{{ top_level_domain }}{{ keycloak_base_path }}/realms/isis"
keycloak_url: "https://{{ top_level_domain }}{{ keycloak_base_path }}"
keycloak_realm: "isis"
keycloak_realm_url_isis: "{{ keycloak_url }}/realms/{{ keycloak_realm }}"
keycloak_token_endpoint_url_isis: "{{ keycloak_realm_url_isis }}/protocol/openid-connect/token"
Loading