Skip to content

Commit 4663c50

Browse files
authored
Add final documentation touches (#4)
* Add final documentation touches * Update CI Adds benchmarks. Limit default decompression concurrency. Clean up in commandline code.
1 parent f6fb18d commit 4663c50

17 files changed

+893
-667
lines changed

.github/workflows/go.yml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
build:
1313
strategy:
1414
matrix:
15-
go-version: [1.22.x, 1.23.x]
15+
go-version: [1.22.x, 1.23.x, 1.24.x]
1616
os: [ubuntu-latest, macos-latest, windows-latest]
1717
env:
1818
CGO_ENABLED: 0
@@ -41,6 +41,9 @@ jobs:
4141
- name: Test No-unsafe, noasm
4242
run: go test -tags="nounsafe,noasm" ./...
4343

44+
- name: Test purego
45+
run: go test -tags="purego" ./...
46+
4447
- name: Test Race 1 CPU
4548
env:
4649
CGO_ENABLED: 1
@@ -79,7 +82,7 @@ jobs:
7982
- name: Set up Go
8083
uses: actions/setup-go@v5.3.0
8184
with:
82-
go-version: 1.23.x
85+
go-version: 1.24.x
8386

8487
- name: Checkout code
8588
uses: actions/checkout@v4
@@ -123,7 +126,7 @@ jobs:
123126
- name: Set up Go
124127
uses: actions/setup-go@v5.3.0
125128
with:
126-
go-version: 1.23.x
129+
go-version: 1.24.x
127130

128131
- name: Checkout code
129132
uses: actions/checkout@v4

.github/workflows/release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121
name: Set up Go
2222
uses: actions/setup-go@5a083d0e9a84784eb32078397cf5459adecb4c40 # v5.2.0
2323
with:
24-
go-version: 1.23.x
24+
go-version: 1.24.x
2525
-
2626
name: Run GoReleaser
2727
uses: goreleaser/goreleaser-action@9ed2f89a662bf1735a48bc8557fd212fa902bebf # v6.1.0

README.md

Lines changed: 186 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,8 @@ the `WriteTo` functionality.
227227
```
228228

229229
The `DecompressConcurrent` has similar functionality to `WriteTo`, but allows specifying the concurrency.
230-
By default `WriteTo` uses `runtime.NumCPU()` concurrent decompressors.
230+
By default `WriteTo` uses `runtime.NumCPU()` or at most 8 concurrent decompressors.
231+
Besides offering higher throughput using `DecompressConcurrent` will also make input reads async when used.
231232

232233
For memory-sensitive systems, the maximum block size can be set below 8MB. For this use the `ReaderMaxBlockSize(int)`
233234
option.
@@ -309,6 +310,14 @@ The following build tags can be used to control which speed improvements are use
309310

310311
Using assembly/non-assembly versions will often produce slightly different output.
311312

313+
We will support 2 releases prior to current Go release version.
314+
315+
This package has been extensively fuzz tested to ensure that no data input can cause
316+
crashes or excessive memory usage.
317+
318+
When doing fuzz testing, use `-tags=nounsafe`. Non-assembly functions will also be tested,
319+
but for completeness also test with `-tags=purego`.
320+
312321
# Performance
313322

314323
## BLOCKS
@@ -324,90 +333,106 @@ As a benchmark this set has an over-emphasis on text files.
324333
Blocks are compressed/decompress using 16 concurrent threads on an AMD Ryzen 9 3950X 16-Core Processor.
325334
Click below to see some sample benchmarks compared to Snappy and LZ4:
326335

336+
### Protobuf Sample
337+
327338
<details>
328-
<summary>Protobuf (118,588 bytes input)</summary>
339+
<summary>Click To See Data + Charts (118,588 bytes input)</summary>
329340

330341
| Compressor | Size | Comp MB/s | Decomp MB/s | Reduction % |
331-
|--------------|--------|-----------|-------------|-------------|
342+
|--------------|--------|----------:|-------------|-------------|
332343
| MinLZ 1 | 17,613 | 27,837 | 116,762 | 85.15% |
333344
| MinLZ 1 (Go) | 17,479 | 22,036 | 61,652 | 85.26% |
334345
| MinLZ 2 | 16,345 | 12,797 | 103,100 | 86.22% |
335346
| MinLZ 2 (Go) | 16,345 | 9,732 | 52,964 | 86.22% |
336347
| MinLZ 3 | 14,766 | 210 | 126,385 | 87.55% |
337348
| MinLZ 3 (Go) | 14,766 | | 68,411 | 87.55% |
338349
| Snappy | 23,335 | 24,052 | 61,002 | 80.32% |
339-
| Snappy (Go) | 23,335 | | 35,699 | 80.32% |
350+
| Snappy (Go) | 23,335 | 10,055 | 35,699 | 80.32% |
340351
| LZ4 0 | 18,766 | 12,649 | 137,553 | 84.18% |
341352
| LZ4 0 (Go) | 18,766 | | 64,092 | 84.18% |
342353
| LZ4 9 | 15,844 | 12,649 | 139,801 | 86.64% |
343354
| LZ4 9 (Go) | 15,844 | | 66,904 | 86.64% |
344355

356+
![Compression vs Size](img/pb-block.png)
357+
345358
Source file: https://github.com/google/snappy/blob/main/testdata/geo.protodata
346359

347360
</details>
348361

362+
### HTML Sample
363+
349364
<details>
350-
<summary>HTML (102,400 bytes input)</summary>
365+
<summary>Click To See Data + Charts (102,400 bytes input)</summary>
351366

352367
| Compressor | Size | Comp MB/s | Decomp MB/s | Reduction % |
353-
|--------------|--------|-----------|-------------|-------------|
368+
|--------------|--------|----------:|-------------|-------------|
354369
| MinLZ 1 | 20,184 | 17,558 | 82,292 | 80.29% |
355370
| MinLZ 1 (Go) | 19,849 | 15,035 | 32,327 | 80.62% |
356371
| MinLZ 2 | 17,831 | 9,260 | 58,432 | 82.59% |
357372
| MinLZ 2 (Go) | 17,831 | 7,524 | 25,728 | 82.59% |
358373
| MinLZ 3 | 16,025 | 180 | 80,445 | 84.35% |
359374
| MinLZ 3 (Go) | 16,025 | | 33,382 | 84.35% |
360375
| Snappy | 22,843 | 17,469 | 44,765 | 77.69% |
361-
| Snappy (Go) | 22,843 | | 21,082 | 77.69% |
376+
| Snappy (Go) | 22,843 | 8,161 | 21,082 | 77.69% |
362377
| LZ4 0 | 21,216 | 9,452 | 101,490 | 79.28% |
363378
| LZ4 0 (Go) | 21,216 | | 40,674 | 79.28% |
364379
| LZ4 9 | 17,139 | 1,407 | 95,706 | 83.26% |
365380
| LZ4 9 (Go) | 17,139 | | 39,709 | 83.26% |
366381

382+
![Compression vs Size](img/html-block.png)
383+
367384
Source file: https://github.com/google/snappy/blob/main/testdata/html
368385

369386
</details>
370387

388+
### URL List Sample
389+
371390
<details>
372-
<summary>URLs. (702,087 bytes input)</summary>
391+
<summary>Click To See Data + Charts (702,087 bytes input)</summary>
373392

374393
| Compressor | Size | Comp MB/s | Decomp MB/s | Reduction % |
375-
|--------------|---------|-----------|-------------|-------------|
394+
|--------------|---------|----------:|-------------|-------------|
376395
| MinLZ 1 | 268,803 | 9,774 | 30,961 | 61.71% |
377396
| MinLZ 1 (Go) | 260,937 | 7,935 | 17,362 | 62.83% |
378397
| MinLZ 2 | 230,280 | 5,197 | 26,871 | 67.20% |
379398
| MinLZ 2 (Go) | 230,280 | 4,280 | 13,926 | 67.20% |
380399
| MinLZ 3 | 207,303 | 226 | 28,716 | 70.47% |
381400
| MinLZ 3 (Go) | 207,303 | | 15,256 | 70.47% |
382401
| Snappy | 335,492 | 9,398 | 24,207 | 52.22% |
383-
| Snappy (Go) | 335,492 | | 12,359 | 52.22% |
402+
| Snappy (Go) | 335,492 | 4,683 | 12,359 | 52.22% |
384403
| LZ4 0 | 299,342 | 4,462 | 51,220 | 57.36% |
385404
| LZ4 0 (Go) | 299,342 | | 23,242 | 57.36% |
386405
| LZ4 9 | 252,182 | 638 | 45,295 | 64.08% |
387406
| LZ4 9 (Go) | 252,182 | | 16,240 | 64.08% |
388407

408+
![Compression vs Size](img/urls-block.png)
409+
389410
Source file: https://github.com/google/snappy/blob/main/testdata/urls.10K
390411

391412
</details>
392413

414+
### Serialized GEO data Sample
415+
393416
<details>
394-
<summary>Serialized binary. (184,320 bytes input)</summary>
417+
<summary>(184,320 bytes input)</summary>
395418

396419
| Compressor | Size | Comp MB/s | Decomp MB/s | Reduction % |
397-
|--------------|--------|-----------|-------------|-------------|
420+
|--------------|--------|----------:|-------------|-------------|
398421
| MinLZ 1 | 63,595 | 8,319 | 26,170 | 65.50% |
399422
| MinLZ 1 (Go) | 62,087 | 7,601 | 12,118 | 66.32% |
400423
| MinLZ 2 | 54,688 | 5,932 | 24,688 | 70.33% |
401424
| MinLZ 2 (Go) | 52,752 | 4,690 | 10,566 | 71.38% |
402425
| MinLZ 3 | 46,002 | 230 | 28,083 | 75.04% |
403426
| MinLZ 3 (Go) | 46,002 | | 12,877 | 75.04% |
404427
| Snappy | 69,526 | 10,198 | 19,754 | 62.28% |
405-
| Snappy (Go) | 69,526 | | 8,712 | 62.28% |
428+
| Snappy (Go) | 69,526 | 5,031 | 8,712 | 62.28% |
406429
| LZ4 0 | 66,506 | 5,355 | 45,305 | 63.92% |
407430
| LZ4 0 (Go) | 66,506 | | 15,757 | 63.92% |
408431
| LZ4 9 | 50,439 | 88 | 52,877 | 72.64% |
409432
| LZ4 9 (Go) | 50,439 | | 18,171 | 72.64% |
410433

434+
![Compression vs Size](img/geo-block.png)
435+
411436
Source file: https://github.com/google/snappy/blob/main/testdata/kppkn.gtb
412437

413438
</details>
@@ -417,22 +442,160 @@ In overall terms, we typically observe that:
417442
* The fastest mode typically beats LZ4 both in speed and output size.
418443
* The fastest mode is typically equal to Snappy in speed, but significantly smaller.
419444
* The "balanced" mode typically beats the best possible LZ4 compression, but much faster.
445+
* Without assembler MinLZ is mostly the fastest option for compression.
420446
* LZ4 is decompression speed king.
421447
* Snappy decompression is usually slowest — especially without assembly.
422448

423449
We encourage you to do your own testing with realistic blocks.
424450

425-
You can use `λ mz c -block -bench=10 -verify -cpu=16 -1 file.ext` with our commandline tool.
451+
You can use `λ mz c -block -bench=10 -verify -cpu=16 -1 file.ext` with our commandline tool to test speed of block encoding/decoding.
426452

427453
## STREAMS
428454

429455
For fair stream comparisons, we run each encoder at its maximum block size
430-
while maintaining independent blocks where it is an option.
456+
or max 4MB, while maintaining independent blocks where it is an option.
431457
We use the concurrency offered by the package.
432458

433459
This means there may be further speed/size tradeoffs possible for each,
434460
so experiment with fine tuning for your needs.
435461

462+
Blocks are compressed/decompress using 16 core AMD Ryzen 9 3950X 16-Core Processor.
463+
464+
### JSON Stream
465+
466+
<details>
467+
<summary>Click To See Data + Charts</summary>
468+
469+
Input Size: 6,273,951,764 bytes
470+
471+
| Compressor | Speed MiB/s | Size | Reduction | Dec MiB/s |
472+
|-------------|------------:|--------------:|----------:|----------:|
473+
| MinLZ 1 | 14,921 | 974,656,419 | 84.47% | 3,204 |
474+
| MinLZ 2 | 8,877 | 901,171,279 | 85.64% | 3,028 |
475+
| MinLZ 3 | 576 | 742,067,802 | 88.17% | 3,835 |
476+
| S2 Default | 15,501 | 1,041,700,255 | 83.40% | 2,378 |
477+
| S2 Better | 9,334 | 944,872,699 | 84.94% | 2,300 |
478+
| S2 Best | 732 | 826,384,742 | 86.83% | 2,572 |
479+
| LZ4 Fastest | 5,860 | 1,274,297,625 | 79.69% | 2,680 |
480+
| LZ4 Best | 1,772 | 1,091,826,460 | 82.60% | 2,694 |
481+
| Snappy | 951 | 1,525,176,492 | 75.69% | 1,828 |
482+
| Gzip L5 | 236 | 938,015,731 | 85.05% | 557 |
483+
484+
![Compression vs Size](img/json-v1-comp.png)
485+
![Decompression Speed](img/json-v1-decomp.png)
486+
487+
Source file: https://files.klauspost.com/compress/github-june-2days-2019.json.zst
488+
489+
</details>
490+
491+
### CSV Stream
492+
493+
<details>
494+
<summary>Click To See Data + Charts</summary>
495+
496+
Input Size: 3,325,605,752 bytes
497+
498+
| Compressor | Speed MiB/s | Size | Reduction |
499+
|------------|-------------|---------------|-----------|
500+
| MinLZ 1 | 9,193 | 937,136,278 | 72.07% |
501+
| MinLZ 2 | 6,158 | 775,823,904 | 77.13% |
502+
| MinLZ 3 | 338 | 657,162,410 | 80.66% |
503+
| S2 Default | 10,679 | 1,093,516,949 | 67.12% |
504+
| S2 Better | 6,394 | 884,711,436 | 73.40% |
505+
| S2 Best | 400 | 773,678,211 | 76.74% |
506+
| LZ4 Fast | 4,835 | 1,066,961,737 | 67.92% |
507+
| LZ4 Best | 732 | 903,598,068 | 72.83% |
508+
| Snappy | 553 | 1,316,042,016 | 60.43% |
509+
| Gzip L5 | 128 | 767,340,514 | 76.93% |
510+
511+
![Compression vs Size](img/csv-v1-comp.png)
512+
513+
Source file: https://files.klauspost.com/compress/nyc-taxi-data-10M.csv.zst
514+
515+
</details>
516+
517+
### Log data
518+
519+
<details>
520+
<summary>Click To See Data + Charts</summary>
521+
522+
Input Size: 2,622,574,440 bytes
523+
524+
| Compressor | Speed MiB/s | Size | Reduction |
525+
|------------|-------------|-------------|-----------|
526+
| MinLZ 1 | 17,014 | 194,361,157 | 92.59% |
527+
| MinLZ 2 | 12,696 | 174,819,425 | 93.33% |
528+
| MinLZ 3 | 1,351 | 139,449,942 | 94.68% |
529+
| S2 Default | 17,131 | 230,521,260 | 91.21% |
530+
| S2 Better | 12,632 | 217,884,566 | 91.69% |
531+
| S2 Best | 1,687 | 185,357,903 | 92.93% |
532+
| LZ4 Fast | 6,115 | 216,323,995 | 91.75% |
533+
| LZ4 Best | 2,704 | 169,447,971 | 93.54% |
534+
| Snappy | 1,987 | 290,116,961 | 88.94% |
535+
| Gzip L5 | 498 | 142,119,985 | 94.58% |
536+
537+
![Compression vs Size](img/logs-v1-comp.png)
538+
539+
Source file: https://files.klauspost.com/compress/apache.log.zst
540+
541+
</details>
542+
543+
### Serialized Data
544+
545+
<details>
546+
<summary>Click To See Data + Charts</summary>
547+
548+
Input Size: 1,862,623,243 bytes
549+
550+
| Compressor | Speed MiB/s | Size | Reduction |
551+
|------------|-------------|-------------|-----------|
552+
| MinLZ 1 | 10,701 | 604,315,773 | 67.56% |
553+
| MinLZ 2 | 5,712 | 517,472,464 | 72.22% |
554+
| MinLZ 3 | 250 | 480,707,192 | 74.19% |
555+
| S2 Default | 12,167 | 623,832,101 | 66.51% |
556+
| S2 Better | 5,712 | 568,441,654 | 69.48% |
557+
| S2 Best | 324 | 553,965,705 | 70.26% |
558+
| LZ4 Fast | 5,090 | 618,174,538 | 66.81% |
559+
| LZ4 Best | 617 | 552,015,243 | 70.36% |
560+
| Snappy | 929 | 589,837,541 | 68.33% |
561+
| Gzip L5 | 166 | 434,950,800 | 76.65% |
562+
563+
![Compression vs Size](img/msgp-v1-comp.png)
564+
565+
Source file: https://files.klauspost.com/compress/github-ranks-backup.bin.zst
566+
567+
</details>
568+
569+
### Backup (Mixed) Data
570+
571+
<details>
572+
<summary>Click To See Data + Charts</summary>
573+
574+
Input Size: 10,065,157,632 bytes
575+
576+
| Compressor | Speed MiB/s | Size | Reduction |
577+
|-------------|-------------|---------------|-----------|
578+
| MinLZ 1 | 9,356 | 5,859,748,636 | 41.78% |
579+
| MinLZ 2 | 5,321 | 5,256,474,340 | 47.78% |
580+
| MinLZ 3 | 259 | 4,855,930,368 | 51.76% |
581+
| S2 Default | 10,083 | 5,915,541,066 | 41.23% |
582+
| S2 Better | 5,731 | 5,455,008,813 | 45.80% |
583+
| S2 Best | 319 | 5,192,490,222 | 48.41% |
584+
| LZ4 Fastest | 5,065 | 5,850,848,099 | 41.87% |
585+
| LZ4 Best | 287 | 5,348,127,708 | 46.86% |
586+
| Snappy | 732 | 6,056,946,612 | 39.82% |
587+
| Gzip L5 | 171 | 4,916,436,115 | 51.15% |
588+
589+
![Compression vs Size](img/10gb-v1-comp.png)
590+
591+
Source file: https://mattmahoney.net/dc/10gb.html
592+
593+
</details>
594+
595+
Our conclusion is that the new compression algorithm provides a good compression increase,
596+
while retaining the ability to saturate pretty much any IO either with compression or
597+
decompression given a moderate amount of CPU cores.
598+
436599

437600
## Why is concurrent block and stream speed so different?
438601

@@ -484,6 +647,9 @@ Speed indications are base 10.
484647

485648
### Compressing
486649

650+
<details>
651+
<summary>Click To Compression Help</summary>
652+
487653
```
488654
Usage: mz c [options] <input>
489655
@@ -534,9 +700,13 @@ Example:
534700
λ mz c apache.log
535701
Compressing apache.log -> apache.log.mz 2622574440 -> 170960982 [6.52%]; 4155.2MB/s
536702
```
703+
</details>
537704

538705
## Decompressing
539706

707+
<details>
708+
<summary>Click To Decompression Help</summary>
709+
540710
```
541711
Usage: mz d [options] <input>
542712
@@ -583,6 +753,7 @@ Example:
583753
λ mz d apache.log.mz
584754
Decompressing apache.log.mz -> apache.log 170960982 -> 2622574440 [1534.02%]; 2660.2MB/s
585755
```
756+
</details>
586757

587758
Tail, Offset and Limit can be made to forward to the next newline by adding `+nl`.
588759

0 commit comments

Comments
 (0)