You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,16 +29,17 @@ The Iceberg handlers are designed to stream CDC events directly into Apache Iceb
29
29
30
30
*`IcebergChangeHandler`: A straightforward handler that appends change data to a source-equivalent Iceberg table using a predefined schema.
31
31
***Use Case**: Best for creating a "bronze" layer where you want to capture the raw Debezium event. The `before` and `after` payloads are stored as complete JSON strings.
32
-
***Schema**: Uses a fixed schema where complex nested fields (`source`, `before`, `after`) are stored as `StringType`. It also includes helpful metadata columns (`_consumed_at`, `_dbz_event_key`, `_dbz_event_key_hash`) for traceability.
33
-
* With consuming data as json, all source syste schema changes will be absorbed automatically.
34
-
***Automatic Table Creation & Partitioning**: **It automatically creates a new Iceberg table for each source table** and partitions it by day on the `_consumed_at` timestamp for efficient time-series queries.
32
+
***Schema**: Uses a fixed schema where complex nested fields (`source`, `before`, `after`) are stored as `StringType`.
33
+
* With consuming data as json, all source system schema changes will be absorbed automatically.
34
+
***Automatic Table Creation & Partitioning**: It automatically creates a new Iceberg table for each source table and partitions it by day on the `_consumed_at` timestamp for efficient time-series queries.
35
+
***Enriched Metadata**: It also adds `_consumed_at`, `_dbz_event_key`, and `_dbz_event_key_hash` columns for enhanced traceability.
35
36
36
37
*`IcebergChangeHandlerV2`: A more advanced handler that automatically infers the schema from the Debezium events and creates a well-structured Iceberg table accordingly.
37
38
***Use Case**: Ideal for scenarios where you want the pipeline to automatically create tables with native data types that mirror the source. This allows for direct querying of the data without needing to parse JSON.
38
39
***Schema and Features**:
39
40
***Automatic Schema Inference**: It inspects the first batch of records for a given table and infers the schema using PyArrow, preserving native data types (e.g., `LongType`, `TimestampType`).
40
41
***Robust Type Handling**: If a field's type cannot be inferred from the initial batch (e.g., it is always `null`), it safely falls back to `StringType` to prevent errors.
41
-
***Automatic Table Creation & Partitioning**: **It automatically creates a new Iceberg table for each source table** and partitions it by day on the `_consumed_at` timestamp for efficient time-series queries.
42
+
***Automatic Table Creation & Partitioning**: It automatically creates a new Iceberg table for each source table and partitions it by day on the `_consumed_at` timestamp for efficient time-series queries.
42
43
***Enriched Metadata**: It also adds `_consumed_at`, `_dbz_event_key`, and `_dbz_event_key_hash` columns for enhanced traceability.
0 commit comments