Skip to content

Commit a0129ab

Browse files
committed
Merge remote-tracking branch 'origin/main' into ml-job-state
* origin/main: chore(deps): update kibana-openapi-spec digest to dcff88f (#1383) chore(deps): update docker.elastic.co/kibana/kibana docker tag to v9.2.0 (#1392) chore(deps): update docker.elastic.co/elasticsearch/elasticsearch docker tag to v9.2.0 (#1391) Add ML Datafeed resource (#1340)
2 parents d8670fd + c137c29 commit a0129ab

File tree

38 files changed

+7318
-1891
lines changed

38 files changed

+7318
-1891
lines changed

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## [Unreleased]
22

3-
4-
- Add new `elasticstack_elasticsearch_ml_anomaly_detection_job` ([#1329](https://github.com/elastic/terraform-provider-elasticstack/pull/1329))
3+
- Add new `elasticstack_elasticsearch_ml_anomaly_detection_job` resource ([#1329](https://github.com/elastic/terraform-provider-elasticstack/pull/1329))
4+
- Add new `elasticstack_elasticsearch_ml_datafeed` resource ([1340](https://github.com/elastic/terraform-provider-elasticstack/pull/1340))
55

66
## [0.12.1] - 2025-10-22
77
- Fix regression restricting the characters in an `elasticstack_elasticsearch_role_mapping` `name`. ([#1373](https://github.com/elastic/terraform-provider-elasticstack/pull/1373))
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
---
2+
# generated by https://github.com/hashicorp/terraform-plugin-docs
3+
page_title: "elasticstack_elasticsearch_ml_datafeed Resource - terraform-provider-elasticstack"
4+
subcategory: "Ml"
5+
description: |-
6+
Creates and manages Machine Learning datafeeds. Datafeeds retrieve data from Elasticsearch for analysis by an anomaly detection job. Each anomaly detection job can have only one associated datafeed. See the ML Datafeed API documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-put-datafeed.html for more details.
7+
---
8+
9+
# elasticstack_elasticsearch_ml_datafeed (Resource)
10+
11+
Creates and manages Machine Learning datafeeds. Datafeeds retrieve data from Elasticsearch for analysis by an anomaly detection job. Each anomaly detection job can have only one associated datafeed. See the [ML Datafeed API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-put-datafeed.html) for more details.
12+
13+
## Example Usage
14+
15+
```terraform
16+
# Basic ML Datafeed
17+
resource "elasticstack_elasticsearch_ml_datafeed" "basic" {
18+
datafeed_id = "my-basic-datafeed"
19+
job_id = elasticstack_elasticsearch_ml_anomaly_detector.example.job_id
20+
indices = ["log-data-*"]
21+
22+
query = jsonencode({
23+
match_all = {}
24+
})
25+
}
26+
27+
# Comprehensive ML Datafeed with all options
28+
resource "elasticstack_elasticsearch_ml_datafeed" "comprehensive" {
29+
datafeed_id = "my-comprehensive-datafeed"
30+
job_id = elasticstack_elasticsearch_ml_anomaly_detector.example.job_id
31+
indices = ["app-logs-*", "system-logs-*"]
32+
33+
query = jsonencode({
34+
bool = {
35+
must = [
36+
{
37+
range = {
38+
"@timestamp" = {
39+
gte = "now-1h"
40+
}
41+
}
42+
},
43+
{
44+
term = {
45+
"status" = "error"
46+
}
47+
}
48+
]
49+
}
50+
})
51+
52+
scroll_size = 1000
53+
frequency = "30s"
54+
query_delay = "60s"
55+
max_empty_searches = 10
56+
57+
chunking_config {
58+
mode = "manual"
59+
time_span = "30m"
60+
}
61+
62+
delayed_data_check_config {
63+
enabled = true
64+
check_window = "2h"
65+
}
66+
67+
indices_options {
68+
ignore_unavailable = true
69+
allow_no_indices = false
70+
expand_wildcards = ["open", "closed"]
71+
}
72+
73+
runtime_mappings = jsonencode({
74+
"hour_of_day" = {
75+
"type" = "long"
76+
"script" = {
77+
"source" = "emit(doc['@timestamp'].value.getHour())"
78+
}
79+
}
80+
})
81+
82+
script_fields = jsonencode({
83+
"my_script_field" = {
84+
"script" = {
85+
"source" = "_score * doc['my_field'].value"
86+
}
87+
}
88+
})
89+
}
90+
91+
# Required ML Job for the datafeed
92+
resource "elasticstack_elasticsearch_ml_anomaly_detector" "example" {
93+
job_id = "example-anomaly-job"
94+
description = "Example anomaly detection job"
95+
96+
analysis_config {
97+
bucket_span = "15m"
98+
detectors {
99+
function = "count"
100+
}
101+
}
102+
103+
data_description {
104+
time_field = "@timestamp"
105+
time_format = "epoch_ms"
106+
}
107+
}
108+
```
109+
110+
<!-- schema generated by tfplugindocs -->
111+
## Schema
112+
113+
### Required
114+
115+
- `datafeed_id` (String) A numerical character string that uniquely identifies the datafeed. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.
116+
- `indices` (List of String) An array of index names. Wildcards are supported. If any of the indices are in remote clusters, the machine learning nodes must have the `remote_cluster_client` role.
117+
- `job_id` (String) Identifier for the anomaly detection job. The job must exist before creating the datafeed.
118+
119+
### Optional
120+
121+
- `aggregations` (String) If set, the datafeed performs aggregation searches. Support for aggregations is limited and should be used only with low cardinality data. This should be a JSON object representing the aggregations to be performed.
122+
- `chunking_config` (Attributes) Datafeeds might search over long time periods, for several months or years. This search is split into time chunks in order to ensure the load on Elasticsearch is managed. Chunking configuration controls how the size of these time chunks are calculated; it is an advanced configuration option. (see [below for nested schema](#nestedatt--chunking_config))
123+
- `delayed_data_check_config` (Attributes) Specifies whether the datafeed checks for missing data and the size of the window. The datafeed can optionally search over indices that have already been read in an effort to determine whether any data has subsequently been added to the index. If missing data is found, it is a good indication that the `query_delay` is set too low and the data is being indexed after the datafeed has passed that moment in time. This check runs only on real-time datafeeds. (see [below for nested schema](#nestedatt--delayed_data_check_config))
124+
- `elasticsearch_connection` (Block List, Deprecated) Elasticsearch connection configuration block. (see [below for nested schema](#nestedblock--elasticsearch_connection))
125+
- `frequency` (String) The interval at which scheduled queries are made while the datafeed runs in real time. The default value is either the bucket span for short bucket spans, or, for longer bucket spans, a sensible fraction of the bucket span. When `frequency` is shorter than the bucket span, interim results for the last (partial) bucket are written then eventually overwritten by the full bucket results. If the datafeed uses aggregations, this value must be divisible by the interval of the date histogram aggregation.
126+
- `indices_options` (Attributes) Specifies index expansion options that are used during search. (see [below for nested schema](#nestedatt--indices_options))
127+
- `max_empty_searches` (Number) If a real-time datafeed has never seen any data (including during any initial training period), it automatically stops and closes the associated job after this many real-time searches return no documents. In other words, it stops after `frequency` times `max_empty_searches` of real-time operation. If not set, a datafeed with no end time that sees no data remains started until it is explicitly stopped.
128+
- `query` (String) The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default uses `{"match_all": {"boost": 1}}`.
129+
- `query_delay` (String) The number of seconds behind real time that data is queried. For example, if data from 10:04 a.m. might not be searchable in Elasticsearch until 10:06 a.m., set this property to 120 seconds. The default value is randomly selected between `60s` and `120s`. This randomness improves the query performance when there are multiple jobs running on the same node.
130+
- `runtime_mappings` (String) Specifies runtime fields for the datafeed search. This should be a JSON object representing the runtime field mappings.
131+
- `script_fields` (String) Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields. This should be a JSON object representing the script fields.
132+
- `scroll_size` (Number) The size parameter that is used in Elasticsearch searches when the datafeed does not use aggregations. The maximum value is the value of `index.max_result_window`, which is 10,000 by default.
133+
134+
### Read-Only
135+
136+
- `id` (String) Internal identifier of the resource
137+
138+
<a id="nestedatt--chunking_config"></a>
139+
### Nested Schema for `chunking_config`
140+
141+
Required:
142+
143+
- `mode` (String) The chunking mode. Can be `auto`, `manual`, or `off`. In `auto` mode, the chunk size is dynamically calculated. In `manual` mode, chunking is applied according to the specified `time_span`. In `off` mode, no chunking is applied.
144+
145+
Optional:
146+
147+
- `time_span` (String) The time span for each chunk. Only applicable and required when mode is `manual`. Must be a valid duration.
148+
149+
150+
<a id="nestedatt--delayed_data_check_config"></a>
151+
### Nested Schema for `delayed_data_check_config`
152+
153+
Required:
154+
155+
- `enabled` (Boolean) Specifies whether the datafeed periodically checks for delayed data.
156+
157+
Optional:
158+
159+
- `check_window` (String) The window of time that is searched for late data. This window of time ends with the latest finalized bucket. It defaults to null, which causes an appropriate `check_window` to be calculated when the real-time datafeed runs.
160+
161+
162+
<a id="nestedblock--elasticsearch_connection"></a>
163+
### Nested Schema for `elasticsearch_connection`
164+
165+
Optional:
166+
167+
- `api_key` (String, Sensitive) API Key to use for authentication to Elasticsearch
168+
- `bearer_token` (String, Sensitive) Bearer Token to use for authentication to Elasticsearch
169+
- `ca_data` (String) PEM-encoded custom Certificate Authority certificate
170+
- `ca_file` (String) Path to a custom Certificate Authority certificate
171+
- `cert_data` (String) PEM encoded certificate for client auth
172+
- `cert_file` (String) Path to a file containing the PEM encoded certificate for client auth
173+
- `endpoints` (List of String, Sensitive) A list of endpoints where the terraform provider will point to, this must include the http(s) schema and port number.
174+
- `es_client_authentication` (String, Sensitive) ES Client Authentication field to be used with the JWT token
175+
- `headers` (Map of String, Sensitive) A list of headers to be sent with each request to Elasticsearch.
176+
- `insecure` (Boolean) Disable TLS certificate validation
177+
- `key_data` (String, Sensitive) PEM encoded private key for client auth
178+
- `key_file` (String) Path to a file containing the PEM encoded private key for client auth
179+
- `password` (String, Sensitive) Password to use for API authentication to Elasticsearch.
180+
- `username` (String) Username to use for API authentication to Elasticsearch.
181+
182+
183+
<a id="nestedatt--indices_options"></a>
184+
### Nested Schema for `indices_options`
185+
186+
Optional:
187+
188+
- `allow_no_indices` (Boolean) If true, wildcard indices expressions that resolve into no concrete indices are ignored. This includes the `_all` string or when no indices are specified.
189+
- `expand_wildcards` (List of String) Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values.
190+
- `ignore_throttled` (Boolean, Deprecated) If true, concrete, expanded, or aliased indices are ignored when frozen. This setting is deprecated.
191+
- `ignore_unavailable` (Boolean) If true, unavailable indices (missing or closed) are ignored.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Basic ML Datafeed
2+
resource "elasticstack_elasticsearch_ml_datafeed" "basic" {
3+
datafeed_id = "my-basic-datafeed"
4+
job_id = elasticstack_elasticsearch_ml_anomaly_detector.example.job_id
5+
indices = ["log-data-*"]
6+
7+
query = jsonencode({
8+
match_all = {}
9+
})
10+
}
11+
12+
# Comprehensive ML Datafeed with all options
13+
resource "elasticstack_elasticsearch_ml_datafeed" "comprehensive" {
14+
datafeed_id = "my-comprehensive-datafeed"
15+
job_id = elasticstack_elasticsearch_ml_anomaly_detector.example.job_id
16+
indices = ["app-logs-*", "system-logs-*"]
17+
18+
query = jsonencode({
19+
bool = {
20+
must = [
21+
{
22+
range = {
23+
"@timestamp" = {
24+
gte = "now-1h"
25+
}
26+
}
27+
},
28+
{
29+
term = {
30+
"status" = "error"
31+
}
32+
}
33+
]
34+
}
35+
})
36+
37+
scroll_size = 1000
38+
frequency = "30s"
39+
query_delay = "60s"
40+
max_empty_searches = 10
41+
42+
chunking_config {
43+
mode = "manual"
44+
time_span = "30m"
45+
}
46+
47+
delayed_data_check_config {
48+
enabled = true
49+
check_window = "2h"
50+
}
51+
52+
indices_options {
53+
ignore_unavailable = true
54+
allow_no_indices = false
55+
expand_wildcards = ["open", "closed"]
56+
}
57+
58+
runtime_mappings = jsonencode({
59+
"hour_of_day" = {
60+
"type" = "long"
61+
"script" = {
62+
"source" = "emit(doc['@timestamp'].value.getHour())"
63+
}
64+
}
65+
})
66+
67+
script_fields = jsonencode({
68+
"my_script_field" = {
69+
"script" = {
70+
"source" = "_score * doc['my_field'].value"
71+
}
72+
}
73+
})
74+
}
75+
76+
# Required ML Job for the datafeed
77+
resource "elasticstack_elasticsearch_ml_anomaly_detector" "example" {
78+
job_id = "example-anomaly-job"
79+
description = "Example anomaly detection job"
80+
81+
analysis_config {
82+
bucket_span = "15m"
83+
detectors {
84+
function = "count"
85+
}
86+
}
87+
88+
data_description {
89+
time_field = "@timestamp"
90+
time_format = "epoch_ms"
91+
}
92+
}

generated/kbapi/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
SHELL := /bin/bash
33
ROOT_DIR := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))
44

5-
github_ref ?= 5c4f76696e63bf9e9a53d55521f4c18faa02ccf2
5+
github_ref ?= dcff88f37b458e385b7d2b1b82d765d5ae887ca3
66
oas_url := https://raw.githubusercontent.com/elastic/kibana/$(github_ref)/oas_docs/output/kibana.yaml
77

88
.PHONY: all

0 commit comments

Comments
 (0)