You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This currently describes the high-level options for compression. Still requires detailed investigation into data compression.
Background
PowerSync currently syncs uncompressed data. By using compression such as gzip or Zstandard, we can reduce client bandwidth usage by 60-90% in some cases, speeding up initial or bulk sync.
Zstandard is a leading compression standard, which has good compression ratios and performance, and can utilize pre-trained dictionaries. Gzip is another alternative that does not have quite the same performance, but is more widely supported.
We have two main options for compression:
Transport-level compression.
Data-level compression.
Transport-level compression
Transport-level compression is fairly simple architecturally: If the client supports it, the service can transparently compress the stream, with the client decompressing it again.
For http streams, we can use the standard Accept-Encoding request header and Content-Encoding response header. This applies to both json and bson-based streams. POC here: #329
Many clients, including browsers, would use this out-of-the-box with no client-side changes required. For example, Chrome and Firefox already support this with both zstd and gzip, and Safari supports gzip.
For websocket connections, the only built-in browser support is permessage-deflate.
Data-level compression
On the service, we can generate one or more zstd dictionaries for each bucket. We compress the data of each operation (currently a json blob) with this dictionary, and sync the compressed data to the client.
The client would download the relevant dictionaries, and store the compressed data in the local ps_oplog table. In the apply_local step, the client decompresses the data before copying to the local tables.
Combining both
Compressing on both the transport-level and the data-level adds some processing overhead without any gains, due to the double-compression. However, we could:
Use transport-level encryption for the stream itself.
Move data out to separate files (proposal pending), and use data compression in those files.
For the data files, we could store metadata (table names, row ids, operation types, etc) and data (json blobs) separately, allowing us to independently compress the metadata and data:
metadata
blob_1
blob_2
...
blob_n
Another advantage of this type of format is that a client could potentially skip many blobs if it doesn't need the entire range. An example of such a file format is Parquet, although we'll likely need something simpler and more purpose-built for our purposes.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Status
This currently describes the high-level options for compression. Still requires detailed investigation into data compression.
Background
PowerSync currently syncs uncompressed data. By using compression such as gzip or Zstandard, we can reduce client bandwidth usage by 60-90% in some cases, speeding up initial or bulk sync.
Zstandard is a leading compression standard, which has good compression ratios and performance, and can utilize pre-trained dictionaries. Gzip is another alternative that does not have quite the same performance, but is more widely supported.
We have two main options for compression:
Transport-level compression
Transport-level compression is fairly simple architecturally: If the client supports it, the service can transparently compress the stream, with the client decompressing it again.
For http streams, we can use the standard Accept-Encoding request header and Content-Encoding response header. This applies to both json and bson-based streams. POC here: #329
Many clients, including browsers, would use this out-of-the-box with no client-side changes required. For example, Chrome and Firefox already support this with both zstd and gzip, and Safari supports gzip.
For websocket connections, the only built-in browser support is
permessage-deflate
.Data-level compression
On the service, we can generate one or more zstd dictionaries for each bucket. We compress the data of each operation (currently a json blob) with this dictionary, and sync the compressed data to the client.
The client would download the relevant dictionaries, and store the compressed data in the local
ps_oplog
table. In theapply_local
step, the client decompresses the data before copying to the local tables.Combining both
Compressing on both the transport-level and the data-level adds some processing overhead without any gains, due to the double-compression. However, we could:
For the data files, we could store metadata (table names, row ids, operation types, etc) and data (json blobs) separately, allowing us to independently compress the metadata and data:
Another advantage of this type of format is that a client could potentially skip many blobs if it doesn't need the entire range. An example of such a file format is Parquet, although we'll likely need something simpler and more purpose-built for our purposes.
Beta Was this translation helpful? Give feedback.
All reactions