opensearch-project · ritvibhatt · Oct 13, 2025 · Oct 13, 2025 · Oct 15, 2025 · Oct 15, 2025
@@ -10,41 +10,43 @@ ad (deprecated by ml command)
 
 
 Description
-============
+===========
 | The ``ad`` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data.
 
 
-Fixed In Time RCF For Time-series Data Command Syntax
-=====================================================
-ad <number_of_trees> <shingle_size> <sample_size> <output_after> <time_decay> <anomaly_rate> <time_field> <date_format> <time_zone>
+Syntax
+======
 
-* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30.
-* shingle_size(integer): optional. A shingle is a consecutive sequence of the most recent records. The default value is 8.
-* sample_size(integer): optional. The sample size used by stream samplers in this forest. The default value is 256.
-* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32.
-* time_decay(double): optional. The decay factor used by stream samplers in this forest. The default value is 0.0001.
-* anomaly_rate(double): optional. The anomaly rate. The default value is 0.005.
-* time_field(string): mandatory. It specifies the time field for RCF to use as time-series data.
-* date_format(string): optional. It's used for formatting time_field field. The default formatting is "yyyy-MM-dd HH:mm:ss".
-* time_zone(string): optional. It's used for setting time zone for time_field filed. The default time zone is UTC.
-* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted.
+Fixed In Time RCF For Time-series Data
+--------------------------------------
+ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] <time_field> [date_format] [time_zone] [category_field]
 
+* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
+* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8.
+* sample_size: optional. The sample size used by stream samplers in this forest. **Default:** 256.
+* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
+* time_decay: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001.
+* anomaly_rate: optional. The anomaly rate. **Default:** 0.005.
+* time_field: mandatory. Specifies the time field for RCF to use as time-series data.
+* date_format: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss".
+* time_zone: optional. Used for setting time zone for time_field. **Default:** "UTC".
+* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
 
-Batch RCF for Non-time-series Data Command Syntax
-=================================================
-ad <number_of_trees> <sample_size> <output_after> <training_data_size> <anomaly_score_threshold>
+Batch RCF For Non-time-series Data
+----------------------------------
+ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]
 
-* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30.
-* sample_size(integer): optional. Number of random samples given to each tree from the training data set. The default value is 256.
-* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32.
-* training_data_size(integer): optional. The default value is the size of your training data set.
-* anomaly_score_threshold(double): optional. The threshold of anomaly score. The default value is 1.0.
-* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted.
+* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
+* sample_size: optional. Number of random samples given to each tree from the training data set. **Default:** 256.
+* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
+* training_data_size: optional. **Default:** size of your training data set.
+* anomaly_score_threshold: optional. The threshold of anomaly score. **Default:** 1.0.
+* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
 
 Example 1: Detecting events in New York City from taxi ridership data with time-series data
 ===========================================================================================
 
-The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.
+This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.
 
 PPL query::
 
@@ -59,7 +61,7 @@ PPL query::
 Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category
 ============================================================================================================================
 
-The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values.
+This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values.
 
 PPL query::
 
@@ -76,7 +78,7 @@ PPL query::
 Example 3: Detecting events in New York City from taxi ridership data with non-time-series data
 ===============================================================================================
 
-The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.
+This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.
 
 PPL query::
 
@@ -91,7 +93,7 @@ PPL query::
 Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category
 ================================================================================================================================
 
-The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values.
+This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values.
 
 PPL query::
 
@@ -108,4 +110,3 @@ PPL query::
 Limitations
 ===========
 The ``ad`` command can only work with ``plugins.calcite.enabled=false``.
-It means ``ad``  command cannot work together with new PPL commands/functions introduced in 3.0.0 and above.
@@ -1,6 +1,6 @@
-=========
+======
 append
-=========
+======
 
 .. rubric:: Table of contents
 
@@ -10,22 +10,18 @@ append
 
 
 Description
-============
-| Using ``append`` command to append the result of a sub-search and attach it as additional rows to the bottom of the input search results (The main search).
-The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
-
-Version
-=======
-3.3.0
+===========
+| The ``append`` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (The main search).
+| The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
 
 Syntax
-============
+======
 append <sub-search>
 
 * sub-search: mandatory. Executes PPL commands as a secondary search.
 
 Example 1: Append rows from a count aggregation to existing search result
-===============================================================
+=========================================================================
 
 This example appends rows from "count by gender" to "sum by gender, state".
 
@@ -45,7 +41,7 @@ PPL query::
     +----------+--------+-------+------------+
 
 Example 2: Append rows with merged column names
-====================================================================================
+===============================================
 
 This example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type.
 
@@ -65,7 +61,7 @@ PPL query::
     +-----+--------+-------+
 
 Example 3: Append rows with column type conflict
-=============================================
+================================================
 
 This example shows how column type conflicts are handled when appending results. Same name columns with different types will generate two different columns in appended result.
 

@@ -11,47 +11,15 @@ appendcol
 
 Description
 ============
-| (Experimental)
-| (From 3.1.0)
-| Using ``appendcol`` command to append the result of a sub-search and attach it alongside with the input search results (The main search).
-
-Version
-=======
-3.1.0
+The ``appendcol`` command appends the result of a sub-search and attaches it alongside with the input search results (The main search).
 
 Syntax
-============
+======
 appendcol [override=<boolean>] <sub-search>
 
-* override=<boolean>: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict.
+* override=<boolean>: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. **Default:** false.
 * sub-search: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.
 
-Configuration
-=============
-This command requires Calcite enabled.
-
-Enable Calcite::
-
-	>> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{
-	  "transient" : {
-	    "plugins.calcite.enabled" : true
-	  }
-	}'
-
-Result set::
-
-    {
-      "acknowledged": true,
-      "persistent": {
-        "plugins": {
-          "calcite": {
-            "enabled": "true"
-          }
-        }
-      },
-      "transient": {}
-    }
-
 Example 1: Append a count aggregation to existing search result
 ===============================================================
 
@@ -103,6 +71,8 @@ PPL query::
 Example 3: Append multiple sub-search results
 =============================================
 
+This example shows how to chain multiple appendcol commands to add columns from different sub-searches.
+
 PPL query::
 
     PPL> source=employees | fields name, dept, age | appendcol [ stats avg(age) as avg_age ] | appendcol [ stats max(age) as max_age ];
@@ -124,6 +94,8 @@ PPL query::
 Example 4: Override case of column name conflict
 ================================================
 
+This example demonstrates the override option when column names conflict between main search and sub-search.
+
 PPL query::
 
     PPL> source=employees | stats avg(age) as agg by dept | appendcol override=true [ stats max(age) as agg by dept ];