Replies: 3 comments 2 replies
-
|
Hey @g9yuayon! Yes, if the timestamp wasn't accurately parsed, it can definitely affect your compression ratio. I was able to successfully compress your example log using: sbin/compress.sh --timestamp-key '\@timestamp' test.jsonlThe To start with, can you give that a shot and see if things improve? We can then move on to seeing if the compression of some other fields can be improved. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks, @kirkrodrigues! Escaping fixed the timestamp parsing and I reran the command. The compression ratio increased to 25X. |
Beta Was this translation helpful? Give feedback.
-
|
Hey @g9yuayon, Below are a few general things you could try out quickly that might help with compression ratio. We might also be able to give you more specific advice if you could tell us more about your use case---e.g., heavy search, long term storage, scale, etc.
The package creates fairly small archives by default, which helps make search more parallelizable at the cost of compression ratio. You can tweak how much data ends up in each archive by modifying The following parameter combinations are probably worth trying out: archive_output:
target_archive_size: 2147483648 # 2 GiB
target_segment_size: 1073741824 # 1 GiB
archive_output:
target_archive_size: 1073741824 # 1 GiB
target_segment_size: 536870912 # 512 MiB
archive_output:
target_archive_size: 536870912 # 512 MiB
target_segment_size: 268435456 # 256 MiB
There is a bit of a tradeoff here, where if archives become too large search can become slower for certain types of queries, so if search speed is important for your use case, you may not want to increase these parameters by too much. Larger archives also lead to higher memory usage (compression tasks will generally use ~ Note that, in general, compression ratio tends to improve until
By default, we use ZStd as our second stage compressor, with a compression level of 3. You can change the compression level by modifying archive_output:
compression_level: 4Increasing the compression level can lead to significant increases in compression ratio, at the cost of reducing compression speed. There are a few more things we could potentially do to help you achieve higher compression ratios if the tweaks above aren't enough (e.g., offering the ability to parse and encode more than one column as a timestamp, offering LZMA as a second stage compressor, exposing features currently not available in the package like array-structurization, etc.). We could also help you take a look at what's limiting your compression ratio more directly if you could send us some sample logs (here or in a private channel). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I ran the clp's compress command to compress multiple-line JSON files and got about 23X of compression ratio:
23X is decent, but is far from the claimed 92X compression ratio. Can I get some suggestions on how to increase the compression ratio? What I know about the JSON files so far:
{ "host.hostname": "xxx.xxx.xxx", "type": "json", "host.mac": "YYYY", ... a lot more key-value pairs "message": "I251121 01:39:01.516291 561950819 3@pebble/event.go:999 ⋮ [n162,s162,pebble] 3011722 [JOB 544048] compacting(default) L0 [1074306] (2.8MB) Score=1.08 + L2 [1074266 1074268 1074269] (7.8MB) Score=0.99; OverlappingRatio: Single 2.81, Multi 0.00", "tags": [ "_no_valid_original_timestamp" ], "environment": "production", "event.ingested": "2025-11-21T01:39:01.916Z", "@timestamp": "2025-11-21T01:39:01.616Z", "event.created": "2025-11-21T01:39:01.616Z", }Not sure if timestamp matters to the compression ratio. CLP failed to parse the the
@timestampvalue as it reported that all the logs are in the time range of January 1, 1970 - January 1, 1970.Beta Was this translation helpful? Give feedback.
All reactions