[Proposal] Compression #330

rkistner · 2025-08-18T16:22:00Z

rkistner
Aug 18, 2025
Maintainer

Status

This currently describes the high-level options for compression. Still requires detailed investigation into data compression.

Background

PowerSync currently syncs uncompressed data. By using compression such as gzip or Zstandard, we can reduce client bandwidth usage by 60-90% in some cases, speeding up initial or bulk sync.

Zstandard is a leading compression standard, which has good compression ratios and performance, and can utilize pre-trained dictionaries. Gzip is another alternative that does not have quite the same performance, but is more widely supported.

We have two main options for compression:

Transport-level compression.
Data-level compression.

Transport-level compression

Transport-level compression is fairly simple architecturally: If the client supports it, the service can transparently compress the stream, with the client decompressing it again.

For http streams, we can use the standard Accept-Encoding request header and Content-Encoding response header. This applies to both json and bson-based streams. POC here: #329
Many clients, including browsers, would use this out-of-the-box with no client-side changes required. For example, Chrome and Firefox already support this with both zstd and gzip, and Safari supports gzip.

For websocket connections, the only built-in browser support is permessage-deflate.

Data-level compression

On the service, we can generate one or more zstd dictionaries for each bucket. We compress the data of each operation (currently a json blob) with this dictionary, and sync the compressed data to the client.

The client would download the relevant dictionaries, and store the compressed data in the local ps_oplog table. In the apply_local step, the client decompresses the data before copying to the local tables.

Combining both

Compressing on both the transport-level and the data-level adds some processing overhead without any gains, due to the double-compression. However, we could:

Use transport-level encryption for the stream itself.
Move data out to separate files (proposal pending), and use data compression in those files.

For the data files, we could store metadata (table names, row ids, operation types, etc) and data (json blobs) separately, allowing us to independently compress the metadata and data:

metadata
blob_1
blob_2
...
blob_n

Another advantage of this type of format is that a client could potentially skip many blobs if it doesn't need the entire range. An example of such a file format is Parquet, although we'll likely need something simpler and more purpose-built for our purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Compression #330

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Proposal] Compression #330

Uh oh!

rkistner Aug 18, 2025 Maintainer

Status

Background

Transport-level compression

Data-level compression

Combining both

Replies: 0 comments

rkistner
Aug 18, 2025
Maintainer