Skip to content

Commit 396a844

Browse files
authored
Docs: Add newline to fix lists (#9664)
1 parent 4835549 commit 396a844

File tree

10 files changed

+27
-9
lines changed

10 files changed

+27
-9
lines changed

docs/docs/configuration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@ Here are the catalog properties related to locking. They are used by some catalo
157157

158158
The following properties from the Hadoop configuration are used by the Hive Metastore connector.
159159
The HMS table locking is a 2-step process:
160+
160161
1. Lock Creation: Create lock in HMS and queue for acquisition
161162
2. Lock Check: Check if lock successfully acquired
162163

@@ -180,6 +181,7 @@ Hive Metastore before the lock is retried from Iceberg.
180181

181182
Warn: Setting `iceberg.engine.hive.lock-enabled`=`false` will cause HiveCatalog to commit to tables without using Hive locks.
182183
This should only be set to `false` if all following conditions are met:
184+
183185
- [HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
184186
is available on the Hive Metastore server
185187
- All other HiveCatalogs committing to tables that this HiveCatalog commits to are also on Iceberg 1.3 or later

docs/docs/delta-lake-migration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runti
3636

3737
### Compatibilities
3838
The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
39+
3940
* `minReaderVersion`: 1
4041
* `minWriterVersion`: 2
4142

@@ -44,6 +45,7 @@ Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/lat
4445
### API
4546
The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
4647
The supported actions are:
48+
4749
* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
4850

4951
### Default Implementation

docs/docs/hive.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,12 +459,14 @@ ALTER TABLE t set TBLPROPERTIES ('metadata_location'='<path>/hivemetadata/00003-
459459

460460
### SELECT
461461
Select statements work the same on Iceberg tables in Hive. You will see the Iceberg benefits over Hive in compilation and execution:
462+
462463
* **No file system listings** - especially important on blob stores, like S3
463464
* **No partition listing from** the Metastore
464465
* **Advanced partition filtering** - the partition keys are not needed in the queries when they could be calculated
465466
* Could handle **higher number of partitions** than normal Hive tables
466467

467468
Here are the features highlights for Iceberg Hive read support:
469+
468470
1. **Predicate pushdown**: Pushdown of the Hive SQL `WHERE` clause has been implemented so that these filters are used at the Iceberg `TableScan` level as well as by the Parquet and ORC Readers.
469471
2. **Column projection**: Columns from the Hive SQL `SELECT` clause are projected down to the Iceberg readers to reduce the number of columns read.
470472
3. **Hive query engines**:

docs/docs/metrics-reporting.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ As of 1.1.0 Iceberg supports the [`MetricsReporter`](../../javadoc/{{ icebergVer
2626

2727
### ScanReport
2828
A [`ScanReport`](../../javadoc/{{ icebergVersion }}/org/apache/iceberg/metrics/ScanReport.html) carries metrics being collected during scan planning against a given table. Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like:
29+
2930
* total scan planning duration
3031
* number of data/delete files included in the result
3132
* number of data/delete manifests scanned/skipped
@@ -35,6 +36,7 @@ A [`ScanReport`](../../javadoc/{{ icebergVersion }}/org/apache/iceberg/metrics/S
3536

3637
### CommitReport
3738
A [`CommitReport`](../../javadoc/{{ icebergVersion }}/org/apache/iceberg/metrics/CommitReport.html) carries metrics being collected after committing changes to a table (aka producing a snapshot). Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like:
39+
3840
* total duration
3941
* number of attempts required for the commit to succeed
4042
* number of added/removed data/delete files

docs/docs/spark-procedures.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,7 @@ CALL catalog_name.system.rewrite_manifests('db.sample', false);
459459
### `rewrite_position_delete_files`
460460

461461
Iceberg can rewrite position delete files, which serves two purposes:
462+
462463
* Minor Compaction: Compact small position delete files into larger ones. This reduces the size of metadata stored in manifest files and overhead of opening small delete files.
463464
* Remove Dangling Deletes: Filter out position delete records that refer to data files that are no longer live. After rewrite_data_files, position delete records pointing to the rewritten data files are not always marked for removal, and can remain tracked by the table's live snapshot metadata. This is known as the 'dangling delete' problem.
464465

@@ -760,6 +761,7 @@ Creates a view that contains the changes from a given table.
760761
| `identifier_columns` | | array<string> | The list of identifier columns to compute updates. If the argument `compute_updates` is set to true and `identifier_columns` are not provided, the table’s current identifier fields will be used. |
761762
762763
Here is a list of commonly used Spark read options:
764+
763765
* `start-snapshot-id`: the exclusive start snapshot ID. If not provided, it reads from the table’s first snapshot inclusively.
764766
* `end-snapshot-id`: the inclusive end snapshot id, default to table's current snapshot.
765767
* `start-timestamp`: the exclusive start timestamp. If not provided, it reads from the table’s first snapshot inclusively.
@@ -807,6 +809,7 @@ SELECT * FROM tbl_changes where _change_type = 'INSERT' AND id = 3 ORDER BY _cha
807809
```
808810
Please note that the changelog view includes Change Data Capture(CDC) metadata columns
809811
that provide additional information about the changes being tracked. These columns are:
812+
810813
- `_change_type`: the type of change. It has one of the following values: `INSERT`, `DELETE`, `UPDATE_BEFORE`, or `UPDATE_AFTER`.
811814
- `_change_ordinal`: the order of changes
812815
- `_commit_snapshot_id`: the snapshot ID where the change occurred

docs/docs/spark-queries.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -295,11 +295,11 @@ SELECT * FROM prod.db.table.files;
295295
| 1 | s3:/.../table/data/00081-4-a9aa8b24-20bc-4d56-93b0-6b7675782bb5-00001-deletes.parquet | PARQUET | 0 | 1 | 1560 | {2147483545:46,2147483546:152} | {2147483545:1,2147483546:1} | {2147483545:0,2147483546:0} | {} | {2147483545:,2147483546:s3:/.../table/data/00000-0-f9709213-22ca-4196-8733-5cb15d2afeb9-00001.parquet} | {2147483545:,2147483546:s3:/.../table/data/00000-0-f9709213-22ca-4196-8733-5cb15d2afeb9-00001.parquet} | NULL | [4] | NULL | NULL | {"data":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"id":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null}} |
296296
| 2 | s3:/.../table/data/00047-25-833044d0-127b-415c-b874-038a4f978c29-00612.parquet | PARQUET | 0 | 126506 | 28613985 | {100:135377,101:11314} | {100:126506,101:126506} | {100:105434,101:11} | {} | {100:0,101:17} | {100:404455227527,101:23} | NULL | NULL | [1] | 0 | {"id":{"column_size":135377,"value_count":126506,"null_value_count":105434,"nan_value_count":null,"lower_bound":0,"upper_bound":404455227527},"data":{"column_size":11314,"value_count":126506,"null_value_count": 11,"nan_value_count":null,"lower_bound":17,"upper_bound":23}} |
297297

298-
!!!info
299-
Content refers to type of content stored by the data file:
300-
0 Data
301-
1 Position Deletes
302-
2 Equality Deletes
298+
!!! info
299+
Content refers to type of content stored by the data file:
300+
* 0 Data
301+
* 1 Position Deletes
302+
* 2 Equality Deletes
303303

304304
To show only data files or delete files, query `prod.db.table.data_files` and `prod.db.table.delete_files` respectively.
305305
To show all files, data files and delete files across all tracked snapshots, query `prod.db.table.all_files`, `prod.db.table.all_data_files` and `prod.db.table.all_delete_files` respectively.
@@ -317,6 +317,7 @@ SELECT * FROM prod.db.table.manifests;
317317
| s3://.../table/metadata/45b5290b-ee61-4788-b324-b1e2735c0e10-m0.avro | 4479 | 0 | 6668963634911763636 | 8 | 0 | 0 | [[false,null,2019-05-13,2019-05-15]] |
318318

319319
Note:
320+
320321
1. Fields within `partition_summaries` column of the manifests table correspond to `field_summary` structs within [manifest list](../../spec.md#manifest-lists), with the following order:
321322
- `contains_null`
322323
- `contains_nan`
@@ -341,6 +342,7 @@ SELECT * FROM prod.db.table.partitions;
341342
| {20211002, 10} | 0 | 3 | 2 | 400 | 0 | 0 | 1 | 1 | 1633169159489000 | 6941468797545315876 |
342343

343344
Note:
345+
344346
1. For unpartitioned tables, the partitions table will not contain the partition and spec_id fields.
345347

346348
2. The partitions metadata table shows partitions with data files or delete files in the current snapshot. However, delete files are not applied, and so in some cases partitions may be shown even though all their data rows are marked deleted by delete files.
@@ -416,6 +418,7 @@ SELECT * FROM prod.db.table.all_manifests;
416418
| s3://.../metadata/a85f78c5-3222-4b37-b7e4-faf944425d48-m0.avro | 6376 | 0 | 6272782676904868561 | 2 | 0 | 0 |[{false, false, 20210101, 20210101}]|
417419

418420
Note:
421+
419422
1. Fields within `partition_summaries` column of the manifests table correspond to `field_summary` structs within [manifest list](../../spec.md#manifest-lists), with the following order:
420423
- `contains_null`
421424
- `contains_nan`

docs/docs/spark-writes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -310,11 +310,11 @@ While inserting or updating Iceberg is capable of resolving schema mismatch at r
310310

311311
* A new column is present in the source but not in the target table.
312312

313-
The new column is added to the target table. Column values are set to `NULL` in all the rows already present in the table
313+
The new column is added to the target table. Column values are set to `NULL` in all the rows already present in the table
314314

315315
* A column is present in the target but not in the source.
316316

317-
The target column value is set to `NULL` when inserting or left unchanged when updating the row.
317+
The target column value is set to `NULL` when inserting or left unchanged when updating the row.
318318

319319
The target table must be configured to accept any schema change by setting the property `write.spark.accept-any-schema` to `true`.
320320

site/docs/how-to-release.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@ svn ci -m 'Iceberg: Add release <VERSION>'
267267
```
268268

269269
!!! Note
270-
The above step requires PMC privileges to execute.
270+
The above step requires PMC privileges to execute.
271271

272272
Next, add a release tag to the git repository based on the passing candidate tag:
273273

@@ -472,7 +472,7 @@ repositories {
472472
```
473473

474474
!!! Note
475-
Replace `${MAVEN_URL}` with the URL provided in the release announcement
475+
Replace `${MAVEN_URL}` with the URL provided in the release announcement
476476

477477
### Verifying with Spark
478478

site/docs/spec.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,7 @@ Any struct, including a top-level schema, can evolve through deleting fields, ad
222222
Grouping a subset of a struct’s fields into a nested struct is **not** allowed, nor is moving fields from a nested struct into its immediate parent struct (`struct<a, b, c> ↔ struct<a, struct<b, c>>`). Evolving primitive types to structs is **not** allowed, nor is evolving a single-field struct to a primitive (`map<string, int> ↔ map<string, struct<int>>`).
223223

224224
Struct evolution requires the following rules for default values:
225+
225226
* The `initial-default` must be set when a field is added and cannot change
226227
* The `write-default` must be set when a field is added and may change
227228
* When a required field is added, both defaults must be set to a non-null value
@@ -1217,6 +1218,7 @@ This serialization scheme is for storing single values as individual binary valu
12171218
### Version 3
12181219

12191220
Default values are added to struct fields in v3.
1221+
12201222
* The `write-default` is a forward-compatible change because it is only used at write time. Old writers will fail because the field is missing.
12211223
* Tables with `initial-default` will be read correctly by older readers if `initial-default` is always null for optional fields. Otherwise, old readers will default optional columns with null. Old readers will fail to read required fields which are populated by `initial-default` because that default is not supported.
12221224

site/docs/view-spec.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ The view version metadata file has the following fields:
6565
| _optional_ | `properties` | A string to string map of view properties [2] |
6666

6767
Notes:
68+
6869
1. The number of versions to retain is controlled by the table property: `version.history.num-entries`.
6970
2. Properties are used for metadata such as `comment` and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata.
7071

@@ -103,6 +104,7 @@ A view version can have more than one representation. All representations for a
103104
View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
104105

105106
Each representation is an object with at least one common field, `type`, that is one of the following:
107+
106108
* `sql`: a SQL SELECT statement that defines the view
107109

108110
Representations further define metadata for each type.

0 commit comments

Comments
 (0)