You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/delta-lake-migration.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,6 +36,7 @@ The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runti
36
36
37
37
### Compatibilities
38
38
The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
39
+
39
40
*`minReaderVersion`: 1
40
41
*`minWriterVersion`: 2
41
42
@@ -44,6 +45,7 @@ Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/lat
44
45
### API
45
46
The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
46
47
The supported actions are:
48
+
47
49
*`snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
Copy file name to clipboardExpand all lines: docs/docs/hive.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -459,12 +459,14 @@ ALTER TABLE t set TBLPROPERTIES ('metadata_location'='<path>/hivemetadata/00003-
459
459
460
460
### SELECT
461
461
Select statements work the same on Iceberg tables in Hive. You will see the Iceberg benefits over Hive in compilation and execution:
462
+
462
463
***No file system listings** - especially important on blob stores, like S3
463
464
***No partition listing from** the Metastore
464
465
***Advanced partition filtering** - the partition keys are not needed in the queries when they could be calculated
465
466
* Could handle **higher number of partitions** than normal Hive tables
466
467
467
468
Here are the features highlights for Iceberg Hive read support:
469
+
468
470
1.**Predicate pushdown**: Pushdown of the Hive SQL `WHERE` clause has been implemented so that these filters are used at the Iceberg `TableScan` level as well as by the Parquet and ORC Readers.
469
471
2.**Column projection**: Columns from the Hive SQL `SELECT` clause are projected down to the Iceberg readers to reduce the number of columns read.
Copy file name to clipboardExpand all lines: docs/docs/metrics-reporting.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,7 @@ As of 1.1.0 Iceberg supports the [`MetricsReporter`](../../javadoc/{{ icebergVer
26
26
27
27
### ScanReport
28
28
A [`ScanReport`](../../javadoc/{{ icebergVersion }}/org/apache/iceberg/metrics/ScanReport.html) carries metrics being collected during scan planning against a given table. Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like:
29
+
29
30
* total scan planning duration
30
31
* number of data/delete files included in the result
31
32
* number of data/delete manifests scanned/skipped
@@ -35,6 +36,7 @@ A [`ScanReport`](../../javadoc/{{ icebergVersion }}/org/apache/iceberg/metrics/S
35
36
36
37
### CommitReport
37
38
A [`CommitReport`](../../javadoc/{{ icebergVersion }}/org/apache/iceberg/metrics/CommitReport.html) carries metrics being collected after committing changes to a table (aka producing a snapshot). Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like:
39
+
38
40
* total duration
39
41
* number of attempts required for the commit to succeed
Iceberg can rewrite position delete files, which serves two purposes:
462
+
462
463
* Minor Compaction: Compact small position delete files into larger ones. This reduces the size of metadata stored in manifest files and overhead of opening small delete files.
463
464
* Remove Dangling Deletes: Filter out position delete records that refer to data files that are no longer live. After rewrite_data_files, position delete records pointing to the rewritten data files are not always marked for removal, and can remain tracked by the table's live snapshot metadata. This is known as the 'dangling delete' problem.
464
465
@@ -760,6 +761,7 @@ Creates a view that contains the changes from a given table.
760
761
|`identifier_columns`|| array<string>| The list of identifier columns to compute updates. If the argument `compute_updates` is set to true and `identifier_columns` are not provided, the table’s current identifier fields will be used. |
761
762
762
763
Here is a list of commonly used Spark read options:
764
+
763
765
*`start-snapshot-id`: the exclusive start snapshot ID. If not provided, it reads from the table’s first snapshot inclusively.
764
766
*`end-snapshot-id`: the inclusive end snapshot id, default to table's current snapshot.
765
767
* `start-timestamp`: the exclusive start timestamp. If not provided, it reads from the table’s first snapshot inclusively.
@@ -807,6 +809,7 @@ SELECT * FROM tbl_changes where _change_type = 'INSERT' AND id = 3 ORDER BY _cha
807
809
```
808
810
Please note that the changelog view includes Change Data Capture(CDC) metadata columns
809
811
that provide additional information about the changes being tracked. These columns are:
812
+
810
813
- `_change_type`: the type of change. It has one of the following values: `INSERT`, `DELETE`, `UPDATE_BEFORE`, or `UPDATE_AFTER`.
811
814
- `_change_ordinal`: the order of changes
812
815
- `_commit_snapshot_id`: the snapshot ID where the change occurred
Content refers to type of content stored by the data file:
300
-
0 Data
301
-
1 Position Deletes
302
-
2 Equality Deletes
298
+
!!!info
299
+
Content refers to type of content stored by the data file:
300
+
* 0 Data
301
+
* 1 Position Deletes
302
+
* 2 Equality Deletes
303
303
304
304
To show only data files or delete files, query `prod.db.table.data_files` and `prod.db.table.delete_files` respectively.
305
305
To show all files, data files and delete files across all tracked snapshots, query `prod.db.table.all_files`, `prod.db.table.all_data_files` and `prod.db.table.all_delete_files` respectively.
@@ -317,6 +317,7 @@ SELECT * FROM prod.db.table.manifests;
1. Fields within `partition_summaries` column of the manifests table correspond to `field_summary` structs within [manifest list](../../spec.md#manifest-lists), with the following order:
321
322
-`contains_null`
322
323
-`contains_nan`
@@ -341,6 +342,7 @@ SELECT * FROM prod.db.table.partitions;
1. For unpartitioned tables, the partitions table will not contain the partition and spec_id fields.
345
347
346
348
2. The partitions metadata table shows partitions with data files or delete files in the current snapshot. However, delete files are not applied, and so in some cases partitions may be shown even though all their data rows are marked deleted by delete files.
@@ -416,6 +418,7 @@ SELECT * FROM prod.db.table.all_manifests;
1. Fields within `partition_summaries` column of the manifests table correspond to `field_summary` structs within [manifest list](../../spec.md#manifest-lists), with the following order:
Copy file name to clipboardExpand all lines: site/docs/spec.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -222,6 +222,7 @@ Any struct, including a top-level schema, can evolve through deleting fields, ad
222
222
Grouping a subset of a struct’s fields into a nested struct is **not** allowed, nor is moving fields from a nested struct into its immediate parent struct (`struct<a, b, c> ↔ struct<a, struct<b, c>>`). Evolving primitive types to structs is **not** allowed, nor is evolving a single-field struct to a primitive (`map<string, int> ↔ map<string, struct<int>>`).
223
223
224
224
Struct evolution requires the following rules for default values:
225
+
225
226
* The `initial-default` must be set when a field is added and cannot change
226
227
* The `write-default` must be set when a field is added and may change
227
228
* When a required field is added, both defaults must be set to a non-null value
@@ -1217,6 +1218,7 @@ This serialization scheme is for storing single values as individual binary valu
1217
1218
### Version 3
1218
1219
1219
1220
Default values are added to struct fields in v3.
1221
+
1220
1222
* The `write-default` is a forward-compatible change because it is only used at write time. Old writers will fail because the field is missing.
1221
1223
* Tables with `initial-default` will be read correctly by older readers if `initial-default` is always null for optional fields. Otherwise, old readers will default optional columns with null. Old readers will fail to read required fields which are populated by `initial-default` because that default is not supported.
Copy file name to clipboardExpand all lines: site/docs/view-spec.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,6 +65,7 @@ The view version metadata file has the following fields:
65
65
|_optional_|`properties`| A string to string map of view properties [2]|
66
66
67
67
Notes:
68
+
68
69
1. The number of versions to retain is controlled by the table property: `version.history.num-entries`.
69
70
2. Properties are used for metadata such as `comment` and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata.
70
71
@@ -103,6 +104,7 @@ A view version can have more than one representation. All representations for a
103
104
View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
104
105
105
106
Each representation is an object with at least one common field, `type`, that is one of the following:
107
+
106
108
*`sql`: a SQL SELECT statement that defines the view
107
109
108
110
Representations further define metadata for each type.
0 commit comments