Skip to content

Commit 196e461

Browse files
authored
Add a new section for transitioning indices to data streams (#2216)
The goal of this PR is to include the content from the [Migrate from Indices to DataStreams](https://support.elastic.dev/knowledge/view/0aa38fde) knowledgebase article into our ILM-related documentation. Initially, the suggestion was to include this new content as a new section on the [Manage existing indices](https://www.elastic.co/docs/manage-data/lifecycle/index-lifecycle-management/manage-existing-indices) page. Upon further review of existing content, the [Tutorial: Automate rollover](https://www.elastic.co/docs/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover) seems like a more appropriate home for this content as it already includes two other equivalent use cases: * [Manage time series data with data streams](https://www.elastic.co/docs/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover#manage-time-series-data-with-data-streams) * [Manage time series data without data streams](https://www.elastic.co/docs/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover#manage-time-series-data-without-data-streams) The reason I think these use cases are equivalent is because they're trying to use ILM policies to migrate from periodic indices to a more automated way to manage rollover and replace the need to schedule or script index creation (one option migrates from indices to data streams and the other one migrates to using aliases in order to manage their backing indices). The new content adds this use case that's equivalent in scope: * [Manage general content with data streams](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/2216/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover#manage-general-content-with-data-streams). Fixes #1571
1 parent c19a071 commit 196e461

File tree

2 files changed

+154
-6
lines changed

2 files changed

+154
-6
lines changed

manage-data/lifecycle/index-lifecycle-management/manage-existing-indices.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ If you’ve been using Curator or some other mechanism to manage periodic indice
1515
* Reindex into an {{ilm-init}}-managed index.
1616

1717
::::{note}
18-
Starting in Curator version 5.7, Curator ignores {{ilm-init}} managed indices.
18+
Starting in Curator version 5.7, Curator ignores {{ilm-init}}-managed indices.
1919
::::
2020

2121

@@ -103,5 +103,4 @@ To reindex into the managed index:
103103

104104
Querying using this alias will now search your new data and all of the reindexed data.
105105

106-
6. Once you have verified that all of the reindexed data is available in the new managed indices, you can safely remove the old indices.
107-
106+
6. Once you have verified that all of the reindexed data is available in the new managed indices, you can safely remove the old indices.

manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md

Lines changed: 152 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,12 @@ products:
1111

1212
When you continuously index timestamped documents into {{es}}, you typically use a [data stream](../../data-store/data-streams.md) so you can periodically [roll over](rollover.md) to a new index. This enables you to implement a [hot-warm-cold architecture](../data-tiers.md) to meet your performance requirements for your newest data, control costs over time, enforce retention policies, and still get the most out of your data.
1313

14-
::::{tip}
15-
[Data streams](../../data-store/data-streams.md) are best suited for [append-only](../../data-store/data-streams.md#data-streams-append-only) use cases. If you need to update or delete existing time series data, you can perform update or delete operations directly on the data stream backing index. If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. You can still use [ILM](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md) to manage and [roll over](rollover.md) the alias’s indices. Skip to [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams).
16-
::::
14+
To simplify index management and automate rollover, select one of the scenarios that best applies to your situation:
15+
16+
* **Roll over data streams with ILM.** When ingesting write-once, timestamped data that doesn't change, follow the steps in [Manage time series data with data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-with-data-streams) for simple, automated data stream rollover. ILM-managed backing indices are automatically created under a single data stream alias. ILM also tracks and transitions the backing indices through the lifecycle automatically.
17+
* **Roll over time series indices with ILM.** Data streams are best suited for [append-only](../../data-store/data-streams.md#data-streams-append-only) use cases. If you need to update or delete existing time series data, you can perform update or delete operations directly on the data stream backing index. If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. You can still use [ILM](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md) to manage and roll over the alias’s indices. Follow the steps in [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) for more information.
18+
* **Roll over general content as data streams with ILM.** If some of your indices store data that isn't timestamped, but you would like to get the benefits of automatic rotation when the index reaches a certain size or age, or delete already rotated indices after a certain amount of time, follow the steps in [Manage general content with data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams). These steps include injecting a timestamp field during indexing time to mimic time series data.
19+
1720

1821
## Manage time series data with data streams [manage-time-series-data-with-data-streams]
1922

@@ -295,3 +298,149 @@ Retrieving the status information for managed indices is very similar to the dat
295298
GET timeseries-*/_ilm/explain
296299
```
297300

301+
## Manage general content with data streams [manage-general-content-with-data-streams]
302+
303+
Data streams are specifically designed for time series data.
304+
If you want to manage general content (data without timestamps) with data streams, you can set up [ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) to transform and enrich your general content by adding a timestamp field at [ingest](/manage-data/ingest.md) time and get the benefits of time-based data management.
305+
306+
For example, search use cases such as knowledge base, website content, e-commerce, or product catalog search, might require you to frequently index general content (data without timestamps). As a result, your index can grow significantly over time, which might impact storage requirements, query performance, and cluster health. Following the steps in this procedure (including a timestamp field and moving to ILM-managed data streams) can help you rotate your indices in a simpler way, based on their size or lifecycle phase.
307+
308+
To roll over your general content from indices to a data stream, you:
309+
310+
1. [Create an ingest pipeline](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-ingest) to process your general content and add a `@timestamp` field.
311+
312+
1. [Create a lifecycle policy](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-policy) that meets your requirements.
313+
314+
1. [Create an index template](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-template) that uses the created ingest pipeline and lifecycle policy.
315+
316+
1. [Create a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-create-stream).
317+
318+
1. *Optional:* If you have an existing, non-managed index and want to migrate your data to the data stream you created, [reindex with a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-reindex).
319+
320+
1. [Update your ingest endpoint](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-endpoint) to target the created data stream.
321+
322+
1. *Optional:* You can use the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) to get status information for your managed indices.
323+
For more information, refer to [Check lifecycle progress](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#ilm-gs-check-progress).
324+
325+
326+
### Create an ingest pipeline to transform your general content [manage-general-content-with-data-streams-ingest]
327+
328+
Create an ingest pipeline that uses the [`set` enrich processor](elasticsearch://reference/enrich-processor/set-processor.md) to add a `@timestamp` field:
329+
330+
```console
331+
PUT _ingest/pipeline/ingest_time_1
332+
{
333+
"description": "Add an ingest timestamp",
334+
"processors": [
335+
{
336+
"set": {
337+
"field": "@timestamp",
338+
"value": "{{_ingest.timestamp}}"
339+
}
340+
}]
341+
}
342+
```
343+
344+
### Create a lifecycle policy [manage-general-content-with-data-streams-policy]
345+
346+
In this example, the policy is configured to roll over when the shard size reaches 10 GB:
347+
348+
```console
349+
PUT _ilm/policy/indextods
350+
{
351+
"policy": {
352+
"phases": {
353+
"hot": {
354+
"min_age": "0ms",
355+
"actions": {
356+
"set_priority": {
357+
"priority": 100
358+
},
359+
"rollover": {
360+
"max_primary_shard_size": "10gb"
361+
}
362+
}
363+
}
364+
}
365+
}
366+
}
367+
```
368+
369+
For more information about lifecycle phases and available actions, check [Create a lifecycle policy](configure-lifecycle-policy.md#ilm-create-policy).
370+
371+
372+
### Create an index template to apply the ingest pipeline and lifecycle policy [manage-general-content-with-data-streams-template]
373+
374+
Create an index template that uses the created ingest pipeline and lifecycle policy:
375+
376+
```console
377+
PUT _index_template/index_to_dot
378+
{
379+
"template": {
380+
"settings": {
381+
"index": {
382+
"lifecycle": {
383+
"name": "indextods"
384+
},
385+
"default_pipeline": "ingest_time_1"
386+
}
387+
},
388+
"mappings": {
389+
"_source": {
390+
"excludes": [],
391+
"includes": [],
392+
"enabled": true
393+
},
394+
"_routing": {
395+
"required": false
396+
},
397+
"dynamic": true,
398+
"numeric_detection": false,
399+
"date_detection": true,
400+
"dynamic_date_formats": [
401+
"strict_date_optional_time",
402+
"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
403+
]
404+
}
405+
},
406+
"index_patterns": [
407+
"movetods"
408+
],
409+
"data_stream": {
410+
"hidden": false,
411+
"allow_custom_routing": false
412+
}
413+
}
414+
```
415+
416+
### Create a data stream [manage-general-content-with-data-streams-create-stream]
417+
418+
Create a data stream using the [_data_stream API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create-data-stream):
419+
420+
```console
421+
PUT /_data_stream/movetods
422+
```
423+
424+
### Optional: Reindex your data with a data stream [manage-general-content-with-data-streams-reindex]
425+
426+
If you want to copy your documents from an existing index to the data stream you created, reindex with a data stream using the [_reindex API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex):
427+
428+
```console
429+
POST /_reindex
430+
{
431+
"source": {
432+
"index": "indextods"
433+
},
434+
"dest": {
435+
"index": "movetods",
436+
"op_type": "create"
437+
438+
}
439+
}
440+
```
441+
442+
For more information, check [Reindex with a data stream](../../data-store/data-streams/use-data-stream.md#reindex-with-a-data-stream).
443+
444+
### Update your ingest endpoint to target the created data stream [manage-general-content-with-data-streams-endpoint]
445+
446+
If you use Elastic clients, scripts, or any other third party tool to ingest data to {{es}}, make sure you update these to use the created data stream.

0 commit comments

Comments
 (0)