You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md
+152-3Lines changed: 152 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,12 @@ products:
11
11
12
12
When you continuously index timestamped documents into {{es}}, you typically use a [data stream](../../data-store/data-streams.md) so you can periodically [roll over](rollover.md) to a new index. This enables you to implement a [hot-warm-cold architecture](../data-tiers.md) to meet your performance requirements for your newest data, control costs over time, enforce retention policies, and still get the most out of your data.
13
13
14
-
::::{tip}
15
-
[Data streams](../../data-store/data-streams.md) are best suited for [append-only](../../data-store/data-streams.md#data-streams-append-only) use cases. If you need to update or delete existing time series data, you can perform update or delete operations directly on the data stream backing index. If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. You can still use [ILM](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md) to manage and [roll over](rollover.md) the alias’s indices. Skip to [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams).
16
-
::::
14
+
To simplify index management and automate rollover, select one of the scenarios that best applies to your situation:
15
+
16
+
***Roll over data streams with ILM.** When ingesting write-once, timestamped data that doesn't change, follow the steps in [Manage time series data with data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-with-data-streams) for simple, automated data stream rollover. ILM-managed backing indices are automatically created under a single data stream alias. ILM also tracks and transitions the backing indices through the lifecycle automatically.
17
+
***Roll over time series indices with ILM.** Data streams are best suited for [append-only](../../data-store/data-streams.md#data-streams-append-only) use cases. If you need to update or delete existing time series data, you can perform update or delete operations directly on the data stream backing index. If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. You can still use [ILM](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md) to manage and roll over the alias’s indices. Follow the steps in [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) for more information.
18
+
***Roll over general content as data streams with ILM.** If some of your indices store data that isn't timestamped, but you would like to get the benefits of automatic rotation when the index reaches a certain size or age, or delete already rotated indices after a certain amount of time, follow the steps in [Manage general content with data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams). These steps include injecting a timestamp field during indexing time to mimic time series data.
19
+
17
20
18
21
## Manage time series data with data streams [manage-time-series-data-with-data-streams]
19
22
@@ -295,3 +298,149 @@ Retrieving the status information for managed indices is very similar to the dat
295
298
GET timeseries-*/_ilm/explain
296
299
```
297
300
301
+
## Manage general content with data streams [manage-general-content-with-data-streams]
302
+
303
+
Data streams are specifically designed for time series data.
304
+
If you want to manage general content (data without timestamps) with data streams, you can set up [ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) to transform and enrich your general content by adding a timestamp field at [ingest](/manage-data/ingest.md) time and get the benefits of time-based data management.
305
+
306
+
For example, search use cases such as knowledge base, website content, e-commerce, or product catalog search, might require you to frequently index general content (data without timestamps). As a result, your index can grow significantly over time, which might impact storage requirements, query performance, and cluster health. Following the steps in this procedure (including a timestamp field and moving to ILM-managed data streams) can help you rotate your indices in a simpler way, based on their size or lifecycle phase.
307
+
308
+
To roll over your general content from indices to a data stream, you:
309
+
310
+
1.[Create an ingest pipeline](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-ingest) to process your general content and add a `@timestamp` field.
311
+
312
+
1.[Create a lifecycle policy](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-policy) that meets your requirements.
313
+
314
+
1.[Create an index template](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-template) that uses the created ingest pipeline and lifecycle policy.
315
+
316
+
1.[Create a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-create-stream).
317
+
318
+
1.*Optional:* If you have an existing, non-managed index and want to migrate your data to the data stream you created, [reindex with a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-reindex).
319
+
320
+
1.[Update your ingest endpoint](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-endpoint) to target the created data stream.
321
+
322
+
1.*Optional:* You can use the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) to get status information for your managed indices.
323
+
For more information, refer to [Check lifecycle progress](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#ilm-gs-check-progress).
324
+
325
+
326
+
### Create an ingest pipeline to transform your general content [manage-general-content-with-data-streams-ingest]
327
+
328
+
Create an ingest pipeline that uses the [`set` enrich processor](elasticsearch://reference/enrich-processor/set-processor.md) to add a `@timestamp` field:
329
+
330
+
```console
331
+
PUT _ingest/pipeline/ingest_time_1
332
+
{
333
+
"description": "Add an ingest timestamp",
334
+
"processors": [
335
+
{
336
+
"set": {
337
+
"field": "@timestamp",
338
+
"value": "{{_ingest.timestamp}}"
339
+
}
340
+
}]
341
+
}
342
+
```
343
+
344
+
### Create a lifecycle policy [manage-general-content-with-data-streams-policy]
345
+
346
+
In this example, the policy is configured to roll over when the shard size reaches 10 GB:
347
+
348
+
```console
349
+
PUT _ilm/policy/indextods
350
+
{
351
+
"policy": {
352
+
"phases": {
353
+
"hot": {
354
+
"min_age": "0ms",
355
+
"actions": {
356
+
"set_priority": {
357
+
"priority": 100
358
+
},
359
+
"rollover": {
360
+
"max_primary_shard_size": "10gb"
361
+
}
362
+
}
363
+
}
364
+
}
365
+
}
366
+
}
367
+
```
368
+
369
+
For more information about lifecycle phases and available actions, check [Create a lifecycle policy](configure-lifecycle-policy.md#ilm-create-policy).
370
+
371
+
372
+
### Create an index template to apply the ingest pipeline and lifecycle policy [manage-general-content-with-data-streams-template]
373
+
374
+
Create an index template that uses the created ingest pipeline and lifecycle policy:
375
+
376
+
```console
377
+
PUT _index_template/index_to_dot
378
+
{
379
+
"template": {
380
+
"settings": {
381
+
"index": {
382
+
"lifecycle": {
383
+
"name": "indextods"
384
+
},
385
+
"default_pipeline": "ingest_time_1"
386
+
}
387
+
},
388
+
"mappings": {
389
+
"_source": {
390
+
"excludes": [],
391
+
"includes": [],
392
+
"enabled": true
393
+
},
394
+
"_routing": {
395
+
"required": false
396
+
},
397
+
"dynamic": true,
398
+
"numeric_detection": false,
399
+
"date_detection": true,
400
+
"dynamic_date_formats": [
401
+
"strict_date_optional_time",
402
+
"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
403
+
]
404
+
}
405
+
},
406
+
"index_patterns": [
407
+
"movetods"
408
+
],
409
+
"data_stream": {
410
+
"hidden": false,
411
+
"allow_custom_routing": false
412
+
}
413
+
}
414
+
```
415
+
416
+
### Create a data stream [manage-general-content-with-data-streams-create-stream]
417
+
418
+
Create a data stream using the [_data_stream API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create-data-stream):
419
+
420
+
```console
421
+
PUT /_data_stream/movetods
422
+
```
423
+
424
+
### Optional: Reindex your data with a data stream [manage-general-content-with-data-streams-reindex]
425
+
426
+
If you want to copy your documents from an existing index to the data stream you created, reindex with a data stream using the [_reindex API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex):
427
+
428
+
```console
429
+
POST /_reindex
430
+
{
431
+
"source": {
432
+
"index": "indextods"
433
+
},
434
+
"dest": {
435
+
"index": "movetods",
436
+
"op_type": "create"
437
+
438
+
}
439
+
}
440
+
```
441
+
442
+
For more information, check [Reindex with a data stream](../../data-store/data-streams/use-data-stream.md#reindex-with-a-data-stream).
443
+
444
+
### Update your ingest endpoint to target the created data stream [manage-general-content-with-data-streams-endpoint]
445
+
446
+
If you use Elastic clients, scripts, or any other third party tool to ingest data to {{es}}, make sure you update these to use the created data stream.
0 commit comments