Skip to content

Commit 1e3f3c5

Browse files
authored
tests: Remove external services requirement from elt-common e2e tests (#209)
### Summary Vastly simplifies the running of the test suite for the `elt-common` package. We use a local sqlite catalog through pyiceberg to avoid the need for running an external REST catalog. The RestCatalog configuration is kept for running tests against the ingestion scripts. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for SQL-based Iceberg catalog configuration as an alternative to REST-based catalogs. * Made Trino authentication credentials (username and password) optional for improved flexibility. * **Tests** * Updated test infrastructure to support multiple catalog types and streamlined test fixture setup. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 8f6c4e5 commit 1e3f3c5

File tree

22 files changed

+306
-409
lines changed

22 files changed

+306
-409
lines changed

.github/actions/run-pytest-with-uv/action.yml

Lines changed: 0 additions & 60 deletions
This file was deleted.

.github/workflows/elt-common_e2e_tests.yml

Lines changed: 0 additions & 51 deletions
This file was deleted.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: Tests for elt-common
2+
on:
3+
push:
4+
branches:
5+
- "main"
6+
paths:
7+
- ".github/workflows/elt-common_tests.yml"
8+
- "elt-common/**"
9+
pull_request:
10+
types: [opened, synchronize, reopened]
11+
paths:
12+
- ".github/workflows/elt-common_tests.yml"
13+
- "elt-common/**"
14+
15+
concurrency:
16+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.run_id }}
17+
cancel-in-progress: true
18+
19+
env:
20+
PYTHON_VERSION: 3.13
21+
UV_VERSION: latest
22+
23+
jobs:
24+
test:
25+
name: test
26+
runs-on: ubuntu-slim
27+
28+
steps:
29+
- name: Checkout
30+
uses: actions/checkout@v6
31+
32+
- name: Install uv
33+
uses: astral-sh/setup-uv@v7
34+
with:
35+
activate-environment: false
36+
cache-dependency-glob: elt-common/pyproject.toml
37+
python-version: ${{ env.PYTHON_VERSION }}
38+
39+
- name: Install the project
40+
shell: bash -l {0}
41+
run: uv sync --locked --all-extras --dev
42+
working-directory: elt-common
43+
44+
- name: Run tests
45+
shell: bash -l {0}
46+
run: uv run pytest --durations-min=0.5 --exitfirst tests/ --cache-clear
47+
working-directory: elt-common

.github/workflows/elt-common_unit_tests.yml

Lines changed: 0 additions & 33 deletions
This file was deleted.

.github/workflows/warehouses_e2e_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: End-to-end tests for extract and load scripts in the warehouse
1+
name: End-to-end extract and load tests
22
on:
33
push:
44
branches:
@@ -24,7 +24,7 @@ env:
2424

2525
jobs:
2626
test:
27-
name: warehouses end-to-end tests
27+
name: test
2828
runs-on: ubuntu-latest
2929

3030
steps:

elt-common/README.md

Lines changed: 4 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ to each pipeline.
88
Development requires the following tools:
99

1010
- [uv](https://docs.astral.uv/uv/): Used to manage both Python installations and dependencies
11-
- [docker & docker compose]: Used for standing up services for end-to-end testing.
1211

1312
### Setting up a Python virtual environment
1413

@@ -21,25 +20,13 @@ package in editable mode, along with the development dependencies:
2120
> uv pip install --editable . --group dev
2221
```
2322

24-
## Running unit tests
23+
## Running the tests
2524

2625
Run the unit tests using `pytest`:
2726

2827
```bash
29-
> pytest tests/unit_tests
28+
> pytest tests
3029
```
3130

32-
## Running end-to-end tests
33-
34-
The end-to-end (e2e) tests for the `pyiceberg` destination require a running Iceberg
35-
rest catalog to test complete functionality.
36-
The local, docker-compose-based configuration provided by
37-
[infra/local/docker-compose.yml](../infra/local/docker-compose.yml) is the easiest way to
38-
spin up a set of services compatible with running the tests.
39-
_Note the requirement to edit `/etc/hosts` described in [infra/local/README](../infra/local/README.md)._
40-
41-
Once the compose services are running, execute the e2e tests using `pytest`:
42-
43-
```bash
44-
> pytest tests/e2e_tests
45-
```
31+
See ["How to invoke pytest"](https://docs.pytest.org/en/stable/how-to/usage.html#usage) fo
32+
instructions on how to limit the tests that are executed.

elt-common/pytest.ini

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
[pytest]
22
filterwarnings= ignore::DeprecationWarning
3+
markers =
4+
requires_trino: marks tests that require a trino server (deselect with '-m "not requires_trino"')

elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py

Lines changed: 47 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import dataclasses
2-
from typing import Dict, Literal, Final, Optional, TypeAlias
2+
from typing import Dict, Literal, Final, Optional, TypeAlias, Type
33

4-
from dlt.common.configuration import configspec
4+
from dlt.common.configuration import configspec, resolve_type
55
from dlt.common.configuration.specs import CredentialsConfiguration
66
from dlt.common.destination.client import DestinationClientDwhConfiguration
77
from dlt.common.destination.exceptions import DestinationTerminalException
@@ -27,7 +27,9 @@ def as_dict(self) -> Dict[str, str]:
2727
"""Return the credentials as a dictionary suitable for the Catalog constructor"""
2828
# Map variable names to property names
2929
# client_id & secret need to be combined
30-
properties = {"credential": self.client_credential()} if self.client_id else {}
30+
properties = {"type": "rest"}
31+
if self.client_id:
32+
properties.update({"credential": self.client_credential()})
3133

3234
field_aliases: Dict[str, str] = {
3335
"access_delegation": f"{CATALOG_HEADER_PREFIX}x-iceberg-access-delegation",
@@ -48,37 +50,63 @@ def as_dict(self) -> Dict[str, str]:
4850
def client_credential(self) -> str:
4951
return f"{self.client_id}:{self.client_secret}"
5052

53+
def on_resolved(self) -> None:
54+
# Check we have the minimum number of required authentication properties
55+
# if any are supplied
56+
auth_props = {
57+
prop: getattr(self, prop)
58+
for prop in ("oauth2_server_uri", "client_id", "client_secret")
59+
}
5160

52-
PyIcebergCatalogCredentials: TypeAlias = PyIcebergRestCatalogCredentials
61+
non_null_count = sum(map(lambda x: 1 if x is not None else 0, auth_props.values()))
62+
if non_null_count != 0 and non_null_count != 3:
63+
raise DestinationTerminalException(
64+
f"Missing required configuration value(s) for authentication: {list(name for name, value in auth_props.items() if value is None)}"
65+
)
66+
67+
68+
@configspec(init=False)
69+
class PyIcebergSqlCatalogCredentials(CredentialsConfiguration):
70+
uri: str = None # type: ignore
71+
warehouse: str = None # type: ignore
72+
73+
def as_dict(self) -> Dict[str, str]:
74+
"""Return the credentials as a dictionary suitable for the Catalog constructor"""
75+
return {"type": "sql", "uri": self.uri, "warehouse": self.warehouse}
76+
77+
def on_resolved(self) -> None:
78+
pass
79+
80+
81+
PyIcebergCatalogCredentials: TypeAlias = (
82+
PyIcebergRestCatalogCredentials | PyIcebergSqlCatalogCredentials
83+
)
5384

5485

5586
@configspec(init=False)
5687
class IcebergClientConfiguration(DestinationClientDwhConfiguration):
57-
destination_type: Final[str] = dataclasses.field(
88+
destination_type: Final[str] = dataclasses.field( # type: ignore[misc]
5889
default="pyiceberg", init=False, repr=False, compare=False
59-
) # type: ignore[misc]
90+
)
6091

61-
catalog_type: Literal["rest"] = "rest"
92+
catalog_type: Literal["rest", "sql"] = "rest"
6293
credentials: PyIcebergCatalogCredentials = None # type: ignore
6394

95+
@resolve_type("credentials")
96+
def resolve_credentials_type(self) -> Type[CredentialsConfiguration]:
97+
return (
98+
PyIcebergRestCatalogCredentials
99+
if self.catalog_type == "rest"
100+
else PyIcebergSqlCatalogCredentials
101+
)
102+
64103
@property
65104
def connection_properties(self) -> Dict[str, str]:
66105
"""Returns a mapping of connection properties to pass to the catalog constructor"""
67106
return self.credentials.as_dict() if self.credentials is not None else {}
68107

69108
def on_resolved(self) -> None:
70-
# Check we have the minimum number of required authentication properties
71-
# if any are supplied
72-
auth_props = {
73-
prop: getattr(self.credentials, prop)
74-
for prop in ("oauth2_server_uri", "client_id", "client_secret")
75-
}
76-
77-
non_null_count = sum(map(lambda x: 1 if x is not None else 0, auth_props.values()))
78-
if non_null_count != 0 and non_null_count != 3:
79-
raise DestinationTerminalException(
80-
f"Missing required configuration value(s) for authentication: {list(name for name, value in auth_props.items() if value is None)}"
81-
)
109+
return self.credentials.on_resolved()
82110

83111
def fingerprint(self) -> str:
84112
"""Returns a fingerprint of a connection string."""

elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
from dlt.common.destination.typing import PreparedTableSchema
44
from dlt.common.schema.typing import TColumnType, TTableSchemaColumns
55
from pyiceberg.catalog import Catalog
6-
from pyiceberg.catalog.rest import RestCatalog
76
from pyiceberg.partitioning import (
87
UNPARTITIONED_PARTITION_SPEC,
98
PartitionField,
@@ -48,17 +47,6 @@
4847
###############################################################################
4948

5049

51-
def create_catalog(name: str, **properties: str) -> Catalog:
52-
"""Create an Iceberg catalog
53-
54-
Args:
55-
name: Name to identify the catalog.
56-
properties: Properties that are passed along to the configuration.
57-
"""
58-
59-
return RestCatalog(name, **properties)
60-
61-
6250
def namespace_exists(catalog: Catalog, namespace: Union[str, Identifier]) -> bool:
6351
try:
6452
catalog.load_namespace_properties(namespace)

0 commit comments

Comments
 (0)