@@ -5,27 +5,31 @@ Image by `DaPino <http://www.iconarchive.com/show/fishing-equipment-icons-by-dap
5
5
Elasticsearch Knapsack Plugin
6
6
=============================
7
7
8
- Knapsack is an index export/import plugin for `Elasticsearch <http://github.com/elasticsearch/elasticsearch >`_.
8
+ Knapsack is an export/import plugin for `Elasticsearch <http://github.com/elasticsearch/elasticsearch >`_.
9
9
10
- It uses tar archive format and gzip compression for input/output.
10
+ It uses archive formats (tar, zip, cpio) and compression algorithms (gzip, bzip2, lzf, xz) for transfer.
11
+
12
+ A direct copy of indexes or index types, or any search results with stored fields is also supported.
13
+
14
+ Optionally, you can transfer archives to Amazon S3.
11
15
12
16
Installation
13
17
------------
14
18
15
- Current version of the plugin is **2.1.5 ** (Nov 6, 2013)
16
-
17
19
.. image :: https://travis-ci.org/jprante/elasticsearch-knapsack.png
18
20
19
21
Prerequisites::
20
22
21
- Elasticsearch 0.90.5 +
23
+ Elasticsearch 0.90+
22
24
23
- ============= ========= ================= ===========================================================
24
- ES version Plugin Release date Command
25
- ------------- --------- ----------------- -----------------------------------------------------------
26
- 0.90.5 **2.1.4 ** Oct 28, 2013 ./bin/plugin --install knapsack --url http://bit.ly/1ipne90
27
- 0.90.6 **2.1.5 ** Nov 6, 2013 ./bin/plugin --install knapsack --url http://bit.ly/17cn710
28
- ============= ========= ================= ===========================================================
25
+ ============= ================= ================= ===========================================================
26
+ ES version Plugin Release date Command
27
+ ------------- ----------------- ----------------- -----------------------------------------------------------
28
+ 0.90.9 0.90.9.1 Jan 9, 2013 ./bin/plugin --install knapsack --url http://bit.ly/1e81hwh
29
+ 0.90.9 0.90.9.1 (S3) Jan 9, 2013 ./bin/plugin --install knapsack --url http://bit.ly/K8QwOJ
30
+ ============= ================= ================= ===========================================================
31
+
32
+ The S3 version includes Amazon AWS API support, it can optionally transfer archives to S3.
29
33
30
34
Do not forget to restart the node after installation.
31
35
@@ -37,93 +41,180 @@ The Maven project site is available at `Github <http://jprante.github.io/elastic
37
41
Binaries
38
42
--------
39
43
40
- Binaries are available at `Bintray <https://bintray.com/pkg/show/general/jprante/elasticsearch-plugins/elasticsearch-knapsack >`_
44
+ Binaries (also older versions) are available at `Bintray <https://bintray.com/pkg/show/general/jprante/elasticsearch-plugins/elasticsearch-knapsack >`_
45
+
46
+ Overview
47
+ ========
48
+
49
+ .. image :: ../../../elasticsearch-knapsack/raw/master/src/site/resources/knapsack-diagram.png
41
50
42
- Documentation
43
- =============
44
51
45
- Note: you must have the _source field enabled, otherwise the Knapsack export will not work.
52
+ Example
53
+ =======
46
54
47
55
Let's assume a simple index::
48
56
49
57
curl -XDELETE localhost:9200/test
50
58
curl -XPUT localhost:9200/test/test/1 -d '{"key":"value 1"}'
51
59
curl -XPUT localhost:9200/test/test/2 -d '{"key":"value 2"}'
52
60
53
- Exporting
54
- ---------
61
+ Exporting to archive
62
+ --------------------
55
63
56
64
You can export this Elasticsearch index with::
57
65
58
66
curl -XPOST localhost:9200/test/test/_export
67
+ {"running":true,"mode":"export","archive":"tar","path":"file:test_test.tar.gz"}
59
68
60
69
The result is a file in the Elasticsearch folder::
61
70
62
- -rw-r--r-- 1 joerg staff 296 9 Dez 14:56 test_test.tar.gz
71
+ -rw-r--r-- 1 es staff 341 8 Jan 22:25 test_test.tar.gz
63
72
64
- Check with tar utility, the settings and the mapping is also exported::
73
+ Check with tar utility, the settings and the mapping is also exported::
65
74
66
- tar ztvf test_test.tar.gz
67
- -rw-r--r-- 0 0 0 116 9 Dez 14:56 test/_settings
68
- -rw-r--r-- 0 0 0 49 9 Dez 14:56 test/test/_mapping
69
- -rw-r--r-- 0 0 0 17 9 Dez 14:56 test/test/1
70
- -rw-r--r-- 0 0 0 17 9 Dez 14:56 test/test/2
75
+ tar ztvf test_test.tar.gz
76
+ ---------- 0 es 0 132 8 Jan 22:25 test/_settings/null/null
77
+ ---------- 0 es 0 49 8 Jan 22:25 test/test/_mapping/null
78
+ ---------- 0 es 0 17 8 Jan 22:25 test/test/2/_source
79
+ ---------- 0 es 0 17 8 Jan 22:25 test/test/1/_source
71
80
72
- Also, you can export with::
81
+ Also, you can export a whole index with::
73
82
74
83
curl -XPOST localhost:9200/test/_export
75
84
76
- with the result file test.tar.gz, or even all data with::
85
+ with the result file test.tar.gz, or even all cluster indices with::
77
86
78
- curl -XPOST localhost:9200/_export
87
+ curl -XPOST ' localhost:9200/_export'
79
88
80
- with the result file _all.tar.gz
89
+ to the file _all.tar.gz
81
90
82
- Importing
83
- ---------
91
+ By default, the archive format is `tar ` with compression `gz ` (gzip).
84
92
85
- You can import the file with::
93
+ You can also export to `zip ` or `cpio ` archive or use another compression scheme.
94
+ Available are `bz2 ` (bzip2), `xz ` (Xz), or `lzf ` (LZF)
86
95
87
- curl -XPOST localhost:9200/test/test/_import
96
+ Export search results
97
+ ----------------------
88
98
89
- Be sure that the index does not exist. You must delete an index by hand. Knapsack does not delete or overwrite data.
99
+ You can add a query to the ` _export ` endpoint just like you would do for searching in Elasticsearch::
90
100
91
- You can import the file to a new index with renaming your file to test2_test2.tar.gz and executing the import command::
101
+ curl -XPOST 'localhost:9200/test/test/_export' -d '{
102
+ "query" : {
103
+ "match" : {
104
+ "myfield" : "myvalue"
105
+ }
106
+ },
107
+ "fields" : [ "_parent", "_source" ]
108
+ }'
92
109
93
- mv test_test.tar.gz test2_test2.tar.gz
94
- curl -XPOST localhost:9200/test2/test2/_import
110
+ Export to an archive with a given path name
111
+ -------------------------------------------
95
112
96
- and check you have copied the data to a new index with::
113
+ You can configure an archive path with the parameter ` path `
97
114
98
- curl -XGET localhost:9200/test2/test2/1
99
- {"_index":"test2","_type":"test2","_id":"1","_version":1,"exists":true, "_source" : {"key":"value 1"}}
115
+ curl -XPOST 'localhost:9200/test/_export?path=/tmp/myarchive.zip'
100
116
117
+ If ELasticsearch can not write an archive to the path, an error message will appear
118
+ and no export will take place.
101
119
102
- State
103
- -----
120
+ Renaming indexes and index types
121
+ --------------------------------
104
122
105
- While exports or imports or running, you can check the state with::
123
+ You can rename indexes and index types by adding a `map ` parameter that contains a JSON
124
+ object with old and new index (and index/type) names::
106
125
107
- curl -XGET localhost:9200/_export/state
126
+ curl -XPOST ' localhost:9200/test/type/ _export?map=\{"test":"testcopy","test/type":"testcopy/typecopy"\}'
108
127
109
- or::
128
+ Copy to local or remote cluster
129
+ -------------------------------
110
130
111
- curl -XGET localhost:9200/_import/state
131
+ If your requirement is not saving data to an archive at all, but only copying, Knapsack is your friend.
112
132
133
+ You can copy an index in the local cluster or to a remote cluster with the `_export/copy ` endpoint.
134
+ Preconditions are: you have the same Java JVM version and the same Elasticsearch version.
135
+
136
+ Example for a local cluster copy of the index `test `::
137
+
138
+ curl -XPOST 'localhost:9200/test/_export/copy?map=\{"test":"testcopy"\}'
139
+
140
+ Example for a remote cluster copy of the index ``test by using the parameters `cluster`, `host`, and `port` ::
141
+
142
+ curl -XPOST 'localhost:9200/test/_export/copy?&cluster=remote&host=127.0.0.1&port=9201'
143
+
144
+ This is a complete example that illustrates how to filter an index by timestamp and copy this part to
145
+ another index::
146
+
147
+ curl -XDELETE 'localhost:9200/test'
148
+ curl -XDELETE 'localhost:9200/testcopy'
149
+ curl -XPUT 'localhost:9200/test/' -d '
150
+ {
151
+ "mappings" : {
152
+ "_default_": {
153
+ "_timestamp" : { "enabled" : true, "store" : true, "path" : "date" }
154
+ }
155
+ }
156
+ }
157
+ '
158
+ curl -XPUT 'localhost:9200/test/doc/1' -d '
159
+ {
160
+ "date" : "2014-01-01T00:00:00",
161
+ "sentence" : "Hi!",
162
+ "value" : 1
163
+ }
164
+ '
165
+ curl -XPUT 'localhost:9200/test/doc/2' -d '
166
+ {
167
+ "date" : "2014-01-02T00:00:00",
168
+ "sentence" : "Hello World!",
169
+ "value" : 2
170
+ }
171
+ '
172
+ curl -XPUT 'localhost:9200/test/doc/3' -d '
173
+ {
174
+ "date" : "2014-01-03T00:00:00",
175
+ "sentence" : "Welcome!",
176
+ "value" : 3
177
+ }
178
+ '
179
+ curl 'localhost:9200/test/_refresh'
180
+ curl -XPOST 'localhost:9200/test/_export/copy?map=\{"test":"testcopy"\}' -d '
181
+ {
182
+ "fields" : [ "_timestamp", "_source" ],
183
+ "query" : {
184
+ "filtered" : {
185
+ "query" : {
186
+ "match_all" : {
187
+ }
188
+ },
189
+ "filter" : {
190
+ "range": {
191
+ "_timestamp" : {
192
+ "from" : "2014-01-02"
193
+ }
194
+ }
195
+ }
196
+ }
197
+ }
198
+ }
199
+ '
200
+ curl '0:9200/test/_search?fields=_timestamp&pretty'
201
+ # wait for bulk flush interval
202
+ sleep 10
203
+ curl '0:9200/testcopy/_search?fields=_timestamp&pretty'
113
204
114
- Choosing a different location
115
- -----------------------------
205
+ Import
206
+ ------
116
207
117
- With the `` target `` parameter, you can choose a path and alternative name for your tar archive. Example ::
208
+ You can import the file with ::
118
209
119
- curl -XPOST 'localhost:9200/_export?target=/big/space/archive.tar.gz '
210
+ curl -XPOST 'localhost:9200/test/test/_import '
120
211
121
- Compression
122
- -----------
212
+ Knapsack does not delete or overwrite data by default.
213
+ But ou can use the parameter ` createIndex ` with the value ` false ` to allow indexing to indexes that exist.
123
214
124
- You can select a `` .tar.gz ``, `` .tar.bz2 ``, or `` .tar.xz `` suffix for the corresponding compression algorithm. Example::
215
+ When importing, you can map your indexes or index/types to your favorite ones.
125
216
126
- curl -XPOST 'localhost:9200/_export?target=/my/archive.tar.bz2 '
217
+ curl -XPOST 'localhost:9200/test/_import?map= \{ "test":"testcopy" \} '
127
218
128
219
Modifying settings and mappings
129
220
-------------------------------
@@ -176,15 +267,61 @@ The result is::
176
267
}
177
268
}
178
269
270
+ Transferring archives to Amazon S3
271
+ ----------------------------------
272
+
273
+ By using special plugin releases including the Amazon AWS S3 API, you can optionally transfer archives
274
+ to S3 or fetch one before importing. You can use the endpoints `_export/s3 ` and _import/s3` for that.
275
+
276
+ Export example::
277
+
278
+ curl -XPOST 'localhost:9200/test/_export/s3?uri=s3://accesskey:secretkey@awshostname&bucketName=mybucket&key=mykey'
279
+
280
+ Import example::
281
+
282
+ curl -XPOST 'localhost:9200/test/_import/s3?uri=s3://accesskey:secretkey@awshostname&bucketName=mybucket&key=mykey'
283
+
284
+ Note, the file name which is used for downloading from S3 is `mybucket/mykey ` and the directory will be created
285
+ if it does not exist.
286
+
287
+
288
+ Check the state of running import/export
289
+ ----------------------------------------
290
+
291
+ While exports or imports or running, you can check the state with::
292
+
293
+ curl -XGET 'localhost:9200/_export/state'
294
+
295
+ or::
296
+
297
+ curl -XGET localhost:9200/_import/state
298
+
179
299
180
300
Caution
181
301
=======
182
302
183
- Knapsack is very simple and works without locking or index snapshots.
184
- So it is up to you to organize the safe export and import.
185
- If the index changes while Knapsack is exporting, you may lose data in the export.
186
- Do not run Knapsack in parallel on the same export.
303
+ Knapsack is very simple and works without locks or snapshots. This means, if Elasticsearch is
304
+ allowed to write to the part of your data in the export while it runs, you may lose data in the export.
305
+ So it is up to you to organize the safe export and import with this plugin.
306
+
307
+ If you want a snapshot/restore feature, please use the standard napshot/restore in the upcoming
308
+ Elasticsearch 1.0 release.
309
+
310
+ Credits
311
+ =======
312
+
313
+ Knapsack contains derived work of Apache Common Compress
314
+ http://commons.apache.org/proper/commons-compress/
315
+
316
+ The code in this component has many origins:
317
+ The bzip2, tar and zip support came from Avalon's Excalibur, but originally
318
+ from Ant, as far as life in Apache goes. The tar package is originally Tim Endres'
319
+ public domain package. The bzip2 package is based on the work done by Keiron Liddle as
320
+ well as Julian Seward's libbzip2. It has migrated via:
321
+ Ant -> Avalon-Excalibur -> Commons-IO -> Commons-Compress.
322
+ The cpio package has been contributed by Michael Kuss and the jRPM project.
187
323
324
+ Thanks to `nicktgr15 <https://github.com/nicktgr15> ` for extending Knapsack to support Amazon S3.
188
325
189
326
License
190
327
=======
0 commit comments