Allow compressed batch sizes up to 32767#9230
Conversation
It should be allowed everywhere in the code already. We have a GUC that can reduce it below the current default of 1000, and this PR also allows the GUC to go up. The GUC is timescaledb.compression_batch_size_limit;
|
@melihmutlu, @dbeck: please review this pull request.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Did you test compressing something with 32k and then querying it with 1k? |
|
I think we should do a comprehensive testing across all algos and a good number of data types. And the algorithm tests should include fallbacks, like dictionary to array and, uuid to dictionary. |
|
Benchmarked with 32767 default batch size just for fun: https://grafana.dev-us-east-1.ops.dev.timescale.com/d/fasYic_4z/compare-benchmark-runs?orgId=1&var-run1=5309&var-run2=5310&var-postgres=16&var-branch=All&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2.0%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=true Lots of regressions, mostly due to dml slowdowns and worse selectivity of compressed metadata filters. Some queries with aggregation up to 40% faster. Interestingly, also some big imporvements in some last-point queries. |
|
@dbeck @svenklemm I added more tests, had to fix a couple of places where we actually didn't support the bigger batches. |
It should be allowed everywhere in the code already. We have a GUC that can reduce it below the current default of 1000, and this PR also allows the GUC to go up.
The GUC is timescaledb.compression_batch_size_limit;
I'm not going to advertise this in the changelog for now, until we get a better understanding of what the implications are. But this GUC will be useful to experiment with various batch sizes.
Disable-check: force-changelog-file