[POC] Compression for sync streams #329
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background on compression options we're investigating in PowerSync: #330
This enables gzip and zstd as transport compression options for sync streams. Since data sync is often performed on metered mobile connections or slow WiFi connections, this could reduce data costs and increase performance for those cases.
This uses the standard
Accept-Encoding
request header, andContent-Encoding
response header. As an example, Chrome would sendaccept-encoding: gzip, deflate, br, zstd
, then we'd response withcontent-encoding: zstd
.zstd
is the best option, and is supported by default in Chrome 123+ and Firefox 126+.gzip
is provided as a fallback option, and is supported in Safari and many native clients.Since browsers send the
accept-encoding
header by default, this will have an immediate effect for the Web SDK. For other SDKs we'd have to test individually, and see whether we need more client-side changes to enable compression.Performance
These are not scientific benchmarks, but just gives an initial idea of expected performance. Since decompression is performed by platform-provided native code in each SDK, this adds minimal overhead and minimal client-side changes (we're not manually decompressing using JavaScript, for example).
Synthetic test dataset: 1.6 million rows of data, 384MiB uncompressed download size, 117MiB data size (the rest is metadata).
zstd compression gives 72.4MiB download size (19% of the original size); gzip gives 76.2MiB (20% of the original size).
In all cases (uncompressed, zstd, gzip), it took around 13s to download the data with curl, and 52s with the diagnostics app. So on a desktop machine, the primary difference is in bandwidth usage, rather than performance.
When downloading the data with curl, the nodejs process averaged around 90% CPU usage for uncompressed data, and 120% when using zstd or gzip. Note that it uses more than 1 core since compression uses a separate thread.
Native clients
OkHttp supports gzip by default - likely affects both React-Native and Kotlin.
NodeJS SDK:
TODO
permessage-deflate
for websockets