From 3266e52132c5af9c44235e744620fdbb2e57f7f1 Mon Sep 17 00:00:00 2001 From: Melanie Ballard Date: Wed, 2 Jul 2025 11:26:04 -0400 Subject: [PATCH 1/6] DOCSP-48557 spark options update --- .../streaming-mode/streaming-write-config.txt | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) diff --git a/source/streaming-mode/streaming-write-config.txt b/source/streaming-mode/streaming-write-config.txt index 2c83646..1ecd679 100644 --- a/source/streaming-mode/streaming-write-config.txt +++ b/source/streaming-mode/streaming-write-config.txt @@ -56,6 +56,116 @@ You can configure the following properties when writing data to MongoDB in strea interface. | | **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory`` + + * - ``convertJson`` + - | Specifies whether the connector parses the string and converts extended JSON + into BSON. + | + | This setting accepts the following values: + + - ``any``: The connector converts all JSON values to BSON. + + - ``"{a: 1}"`` becomes ``{a: 1}``. + - ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``. + - ``"true"`` becomes ``true``. + - ``"01234"`` becomes ``1234``. + - ``"{a:b:c}"`` doesn't change. + + - ``objectOrArrayOnly``: The connector converts only JSON objects and arrays to + BSON. + + - ``"{a: 1}"`` becomes ``{a: 1}``. + - ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``. + - ``"true"`` doesn't change. + - ``"01234"`` doesn't change. + - ``"{a:b:c}"`` doesn't change. + + - ``false``: The connector leaves all values as strings. + + | **Default:** ``false`` + + * - ``idFieldList`` + - | Field or list of fields by which to split the collection data. To + specify more than one field, separate them using a comma as shown + in the following example: + + .. code-block:: none + :copyable: false + + "fieldName1,fieldName2" + + | **Default:** ``_id`` + + * - ``ignoreNullValues`` + - | When ``true``, the connector ignores any ``null`` values when writing, + including ``null`` values in arrays and nested documents. + | + | **Default:** ``false`` + + * - ``maxBatchSize`` + - | Specifies the maximum number of operations to batch in bulk + operations. + | + | **Default:** ``512`` + + * - ``operationType`` + - | Specifies the type of write operation to perform. You can set + this to one of the following values: + + - ``insert``: Insert the data. + - ``replace``: Replace an existing document that matches the + ``idFieldList`` value with the new data. If no match exists, the + value of ``upsertDocument`` indicates whether the connector + inserts a new document. + - ``update``: Update an existing document that matches the + ``idFieldList`` value with the new data. If no match exists, the + value of ``upsertDocument`` indicates whether the connector + inserts a new document. + + | + | **Default:** ``replace`` + + * - ``ordered`` + - | Specifies whether to perform ordered bulk operations. + | + | **Default:** ``true`` + + * - ``upsertDocument`` + - | When ``true``, replace and update operations will insert the data + if no match exists. + | + | For time series collections, you must set ``upsertDocument`` to + ``false``. + | + | **Default:** ``true`` + + * - ``writeConcern.journal`` + - | Specifies ``j``, a write-concern option to enable request for + acknowledgment that the data is confirmed on on-disk journal for + the criteria specified in the ``w`` option. You can specify + either ``true`` or ``false``. + | + | For more information on ``j`` values, see the MongoDB server + guide on the + :manual:`WriteConcern j option `. + + * - ``writeConcern.w`` + - | Specifies ``w``, a write-concern option to request acknowledgment + that the write operation has propagated to a specified number of + MongoDB nodes. For a list + of allowed values for this option, see :manual:`WriteConcern + ` in the MongoDB manual. + | + | **Default:** ``1`` + + * - ``writeConcern.wTimeoutMS`` + - | Specifies ``wTimeoutMS``, a write-concern option to return an error + when a write operation exceeds the number of milliseconds. If you + use this optional setting, you must specify a nonnegative integer. + | + | For more information on ``wTimeoutMS`` values, see the MongoDB server + guide on the + :manual:`WriteConcern wtimeout option `. * - ``checkpointLocation`` - | The absolute file path of the directory to which the connector writes checkpoint From a02ce805e53d036a84919376b5ef2795137b177e Mon Sep 17 00:00:00 2001 From: Melanie Ballard Date: Wed, 2 Jul 2025 11:59:43 -0400 Subject: [PATCH 2/6] DOCSP-48557 links and reorder --- source/batch-mode/batch-write-config.txt | 26 +++++++++--------- .../streaming-mode/streaming-write-config.txt | 27 +++++++------------ 2 files changed, 24 insertions(+), 29 deletions(-) diff --git a/source/batch-mode/batch-write-config.txt b/source/batch-mode/batch-write-config.txt index 3eb030f..17da29b 100644 --- a/source/batch-mode/batch-write-config.txt +++ b/source/batch-mode/batch-write-config.txt @@ -139,24 +139,26 @@ You can configure the following properties when writing data to MongoDB in batch | | **Default:** ``true`` + * - ``writeConcern.w`` + - | Specifies ``w``, a write-concern option to request acknowledgment that + the write operation has propagated to a specified number of MongoDB + nodes. + | + | For a list of allowed values for this option, see + :manual:`WriteConcern w Option ` in the + MongoDB manual. + | + | **Default:** ``1`` + * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option to enable request for acknowledgment that the data is confirmed on on-disk journal for the criteria specified in the ``w`` option. You can specify either ``true`` or ``false``. | - | For more information on ``j`` values, see the MongoDB server + | For more information on ``j`` values, see the {+mdb-server+} guide on the - :manual:`WriteConcern j option `. - - * - ``writeConcern.w`` - - | Specifies ``w``, a write-concern option to request acknowledgment - that the write operation has propagated to a specified number of - MongoDB nodes. For a list - of allowed values for this option, see :manual:`WriteConcern - ` in the MongoDB manual. - | - | **Default:** ``1`` + :manual:`WriteConcern j Option `. * - ``writeConcern.wTimeoutMS`` - | Specifies ``wTimeoutMS``, a write-concern option to return an error @@ -165,7 +167,7 @@ You can configure the following properties when writing data to MongoDB in batch | | For more information on ``wTimeoutMS`` values, see the MongoDB server guide on the - :manual:`WriteConcern wtimeout option `. + :manual:`WriteConcern wtimeout `. Specifying Properties in ``connection.uri`` ------------------------------------------- diff --git a/source/streaming-mode/streaming-write-config.txt b/source/streaming-mode/streaming-write-config.txt index 1ecd679..8425c38 100644 --- a/source/streaming-mode/streaming-write-config.txt +++ b/source/streaming-mode/streaming-write-config.txt @@ -139,40 +139,33 @@ You can configure the following properties when writing data to MongoDB in strea | | **Default:** ``true`` + * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option to enable request for acknowledgment that the data is confirmed on on-disk journal for the criteria specified in the ``w`` option. You can specify either ``true`` or ``false``. | - | For more information on ``j`` values, see the MongoDB server - guide on the - :manual:`WriteConcern j option `. - - * - ``writeConcern.w`` - - | Specifies ``w``, a write-concern option to request acknowledgment - that the write operation has propagated to a specified number of - MongoDB nodes. For a list - of allowed values for this option, see :manual:`WriteConcern - ` in the MongoDB manual. - | - | **Default:** ``1`` + | For more information on ``j`` values, see + :manual:`WriteConcern j Option ` in the + {+mdb-server+} manual. * - ``writeConcern.wTimeoutMS`` - | Specifies ``wTimeoutMS``, a write-concern option to return an error when a write operation exceeds the number of milliseconds. If you use this optional setting, you must specify a nonnegative integer. | - | For more information on ``wTimeoutMS`` values, see the MongoDB server - guide on the - :manual:`WriteConcern wtimeout option `. + | For more information on ``wTimeoutMS`` values, see + :manual:`WriteConcern wtimeout ` in the + {+mdb-server+} manual. * - ``checkpointLocation`` - | The absolute file path of the directory to which the connector writes checkpoint information. | - | For more information about checkpoints, see the - `Spark Structured Streaming Programming Guide `__ + | For more information about checkpoints, see the `Spark Structured + Streaming Programming Guide + `__ | | **Default:** None From c5ee8a384d86943d12c93a6254c780f1f9b6707d Mon Sep 17 00:00:00 2001 From: Melanie Ballard Date: Wed, 2 Jul 2025 12:15:28 -0400 Subject: [PATCH 3/6] DOCSP-48667 link fix + mdb-server consistency --- source/batch-mode/batch-write-config.txt | 18 +++++++------- .../streaming-mode/streaming-write-config.txt | 24 +++++++++++++------ 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/source/batch-mode/batch-write-config.txt b/source/batch-mode/batch-write-config.txt index 17da29b..3514911 100644 --- a/source/batch-mode/batch-write-config.txt +++ b/source/batch-mode/batch-write-config.txt @@ -144,9 +144,9 @@ You can configure the following properties when writing data to MongoDB in batch the write operation has propagated to a specified number of MongoDB nodes. | - | For a list of allowed values for this option, see - :manual:`WriteConcern w Option ` in the - MongoDB manual. + | For a list of allowed values for this option, see :manual:`WriteConcern + w Option ` in the {+mdb-server+} + manual. | | **Default:** ``1`` @@ -156,18 +156,18 @@ You can configure the following properties when writing data to MongoDB in batch the criteria specified in the ``w`` option. You can specify either ``true`` or ``false``. | - | For more information on ``j`` values, see the {+mdb-server+} - guide on the - :manual:`WriteConcern j Option `. + | For more information on ``j`` values, see :manual:`WriteConcern j + Option ` in the {+mdb-server+} + manual. * - ``writeConcern.wTimeoutMS`` - | Specifies ``wTimeoutMS``, a write-concern option to return an error when a write operation exceeds the number of milliseconds. If you use this optional setting, you must specify a nonnegative integer. | - | For more information on ``wTimeoutMS`` values, see the MongoDB server - guide on the - :manual:`WriteConcern wtimeout `. + | For more information on ``wTimeoutMS`` values, see + :manual:`WriteConcern wtimeout ` in + the {+mdb-server+} manual. Specifying Properties in ``connection.uri`` ------------------------------------------- diff --git a/source/streaming-mode/streaming-write-config.txt b/source/streaming-mode/streaming-write-config.txt index 8425c38..4346f62 100644 --- a/source/streaming-mode/streaming-write-config.txt +++ b/source/streaming-mode/streaming-write-config.txt @@ -139,6 +139,16 @@ You can configure the following properties when writing data to MongoDB in strea | | **Default:** ``true`` + * - ``writeConcern.w`` + - | Specifies ``w``, a write-concern option to request acknowledgment that + the write operation has propagated to a specified number of MongoDB + nodes. + | + | For a list of allowed values for this option, see :manual:`WriteConcern + w Option ` in the {+mdb-server+} + manual. + | + | **Default:** ``1`` * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option to enable request for @@ -146,18 +156,18 @@ You can configure the following properties when writing data to MongoDB in strea the criteria specified in the ``w`` option. You can specify either ``true`` or ``false``. | - | For more information on ``j`` values, see - :manual:`WriteConcern j Option ` in the - {+mdb-server+} manual. + | For more information on ``j`` values, see :manual:`WriteConcern j + Option ` in the {+mdb-server+} + manual. * - ``writeConcern.wTimeoutMS`` - | Specifies ``wTimeoutMS``, a write-concern option to return an error when a write operation exceeds the number of milliseconds. If you use this optional setting, you must specify a nonnegative integer. | - | For more information on ``wTimeoutMS`` values, see - :manual:`WriteConcern wtimeout ` in the - {+mdb-server+} manual. + | For more information on ``wTimeoutMS`` values, see + :manual:`WriteConcern wtimeout ` in + the {+mdb-server+} manual. * - ``checkpointLocation`` - | The absolute file path of the directory to which the connector writes checkpoint @@ -165,7 +175,7 @@ You can configure the following properties when writing data to MongoDB in strea | | For more information about checkpoints, see the `Spark Structured Streaming Programming Guide - `__ + `__ | | **Default:** None From c3b319db35f54b44697f16f2fc26dc3f7321319d Mon Sep 17 00:00:00 2001 From: Melanie Ballard Date: Wed, 2 Jul 2025 13:26:39 -0400 Subject: [PATCH 4/6] DOCSP-48557 wording --- source/batch-mode/batch-write-config.txt | 20 ++++++++--------- .../streaming-mode/streaming-write-config.txt | 22 +++++++++---------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/source/batch-mode/batch-write-config.txt b/source/batch-mode/batch-write-config.txt index 3514911..7d4cdbf 100644 --- a/source/batch-mode/batch-write-config.txt +++ b/source/batch-mode/batch-write-config.txt @@ -58,7 +58,7 @@ You can configure the following properties when writing data to MongoDB in batch | **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory`` * - ``convertJson`` - - | Specifies whether the connector parses the string and converts extended JSON + - | Specifies if the connector parses string values and converts extended JSON into BSON. | | This setting accepts the following values: @@ -85,7 +85,7 @@ You can configure the following properties when writing data to MongoDB in batch | **Default:** ``false`` * - ``idFieldList`` - - | Field or list of fields by which to split the collection data. To + - | Specifies a field or list of fields by which to split the collection data. To specify more than one field, separate them using a comma as shown in the following example: @@ -131,7 +131,7 @@ You can configure the following properties when writing data to MongoDB in batch | **Default:** ``true`` * - ``upsertDocument`` - - | When ``true``, replace and update operations will insert the data + - | When ``true``, replace and update operations insert the data if no match exists. | | For time series collections, you must set ``upsertDocument`` to @@ -140,7 +140,7 @@ You can configure the following properties when writing data to MongoDB in batch | **Default:** ``true`` * - ``writeConcern.w`` - - | Specifies ``w``, a write-concern option to request acknowledgment that + - | Specifies ``w``, a write-concern option requesting acknowledgment that the write operation has propagated to a specified number of MongoDB nodes. | @@ -148,13 +148,13 @@ You can configure the following properties when writing data to MongoDB in batch w Option ` in the {+mdb-server+} manual. | - | **Default:** ``1`` + | **Default:** ``majority`` *or* ``1`` * - ``writeConcern.journal`` - - | Specifies ``j``, a write-concern option to enable request for - acknowledgment that the data is confirmed on on-disk journal for - the criteria specified in the ``w`` option. You can specify - either ``true`` or ``false``. + - | Specifies ``j``, a write-concern option requesting acknowledgment that + the data has been written to the on-disk journal for the criteria + specified in the ``w`` option. You can specify either ``true`` or + ``false``. | | For more information on ``j`` values, see :manual:`WriteConcern j Option ` in the {+mdb-server+} @@ -162,7 +162,7 @@ You can configure the following properties when writing data to MongoDB in batch * - ``writeConcern.wTimeoutMS`` - | Specifies ``wTimeoutMS``, a write-concern option to return an error - when a write operation exceeds the number of milliseconds. If you + when a write operation exceeds the specified number of milliseconds. If you use this optional setting, you must specify a nonnegative integer. | | For more information on ``wTimeoutMS`` values, see diff --git a/source/streaming-mode/streaming-write-config.txt b/source/streaming-mode/streaming-write-config.txt index 4346f62..a8d95f8 100644 --- a/source/streaming-mode/streaming-write-config.txt +++ b/source/streaming-mode/streaming-write-config.txt @@ -58,7 +58,7 @@ You can configure the following properties when writing data to MongoDB in strea | **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory`` * - ``convertJson`` - - | Specifies whether the connector parses the string and converts extended JSON + - | Specifies if the connector parses string values and converts extended JSON into BSON. | | This setting accepts the following values: @@ -85,7 +85,7 @@ You can configure the following properties when writing data to MongoDB in strea | **Default:** ``false`` * - ``idFieldList`` - - | Field or list of fields by which to split the collection data. To + - | Specifies a field or list of fields by which to split the collection data. To specify more than one field, separate them using a comma as shown in the following example: @@ -131,7 +131,7 @@ You can configure the following properties when writing data to MongoDB in strea | **Default:** ``true`` * - ``upsertDocument`` - - | When ``true``, replace and update operations will insert the data + - | When ``true``, replace and update operations insert the data if no match exists. | | For time series collections, you must set ``upsertDocument`` to @@ -140,7 +140,7 @@ You can configure the following properties when writing data to MongoDB in strea | **Default:** ``true`` * - ``writeConcern.w`` - - | Specifies ``w``, a write-concern option to request acknowledgment that + - | Specifies ``w``, a write-concern option requesting acknowledgment that the write operation has propagated to a specified number of MongoDB nodes. | @@ -148,13 +148,13 @@ You can configure the following properties when writing data to MongoDB in strea w Option ` in the {+mdb-server+} manual. | - | **Default:** ``1`` + | **Default:** ``majority`` *or* ``1`` * - ``writeConcern.journal`` - - | Specifies ``j``, a write-concern option to enable request for - acknowledgment that the data is confirmed on on-disk journal for - the criteria specified in the ``w`` option. You can specify - either ``true`` or ``false``. + - | Specifies ``j``, a write-concern option requesting acknowledgment that + the data has been written to the on-disk journal for the criteria + specified in the ``w`` option. You can specify either ``true`` or + ``false``. | | For more information on ``j`` values, see :manual:`WriteConcern j Option ` in the {+mdb-server+} @@ -162,7 +162,7 @@ You can configure the following properties when writing data to MongoDB in strea * - ``writeConcern.wTimeoutMS`` - | Specifies ``wTimeoutMS``, a write-concern option to return an error - when a write operation exceeds the number of milliseconds. If you + when a write operation exceeds the specified number of milliseconds. If you use this optional setting, you must specify a nonnegative integer. | | For more information on ``wTimeoutMS`` values, see @@ -170,7 +170,7 @@ You can configure the following properties when writing data to MongoDB in strea the {+mdb-server+} manual. * - ``checkpointLocation`` - - | The absolute file path of the directory to which the connector writes checkpoint + - | The absolute file path of the directory where the connector writes checkpoint information. | | For more information about checkpoints, see the `Spark Structured From e49a0a3354219da9ac519ed6d1472d44007cc40a Mon Sep 17 00:00:00 2001 From: Melanie Ballard Date: Wed, 2 Jul 2025 14:12:15 -0400 Subject: [PATCH 5/6] DOCSP-48557 remove ital --- source/batch-mode/batch-write-config.txt | 2 +- source/streaming-mode/streaming-write-config.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/source/batch-mode/batch-write-config.txt b/source/batch-mode/batch-write-config.txt index 7d4cdbf..12761e3 100644 --- a/source/batch-mode/batch-write-config.txt +++ b/source/batch-mode/batch-write-config.txt @@ -148,7 +148,7 @@ You can configure the following properties when writing data to MongoDB in batch w Option ` in the {+mdb-server+} manual. | - | **Default:** ``majority`` *or* ``1`` + | **Default:** ``majority`` or ``1`` * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option requesting acknowledgment that diff --git a/source/streaming-mode/streaming-write-config.txt b/source/streaming-mode/streaming-write-config.txt index a8d95f8..d4384b6 100644 --- a/source/streaming-mode/streaming-write-config.txt +++ b/source/streaming-mode/streaming-write-config.txt @@ -148,7 +148,7 @@ You can configure the following properties when writing data to MongoDB in strea w Option ` in the {+mdb-server+} manual. | - | **Default:** ``majority`` *or* ``1`` + | **Default:** ``majority`` or ``1`` * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option requesting acknowledgment that From e38562838e943c82e2da5d445f8e5d1f852c0e5b Mon Sep 17 00:00:00 2001 From: Melanie Ballard Date: Mon, 7 Jul 2025 16:17:13 -0400 Subject: [PATCH 6/6] DOCSP-49557 update default w concern --- source/batch-mode/batch-write-config.txt | 2 +- source/streaming-mode/streaming-write-config.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/source/batch-mode/batch-write-config.txt b/source/batch-mode/batch-write-config.txt index 12761e3..fedad4c 100644 --- a/source/batch-mode/batch-write-config.txt +++ b/source/batch-mode/batch-write-config.txt @@ -148,7 +148,7 @@ You can configure the following properties when writing data to MongoDB in batch w Option ` in the {+mdb-server+} manual. | - | **Default:** ``majority`` or ``1`` + | **Default:** ``Acknowledged`` * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option requesting acknowledgment that diff --git a/source/streaming-mode/streaming-write-config.txt b/source/streaming-mode/streaming-write-config.txt index d4384b6..6aa176f 100644 --- a/source/streaming-mode/streaming-write-config.txt +++ b/source/streaming-mode/streaming-write-config.txt @@ -148,7 +148,7 @@ You can configure the following properties when writing data to MongoDB in strea w Option ` in the {+mdb-server+} manual. | - | **Default:** ``majority`` or ``1`` + | **Default:** ``Acknowledged`` * - ``writeConcern.journal`` - | Specifies ``j``, a write-concern option requesting acknowledgment that