sstable: fix estimation of empty sstable writer#5633
sstable: fix estimation of empty sstable writer#5633annrpom wants to merge 1 commit intocockroachdb:masterfrom
Conversation
This patch adds an estimation of the properties and filter blocks during the calculation of the estimated size of an sstable being written.
|
@jbowens how do u feel about this one |
sumeerbhola
left a comment
There was a problem hiding this comment.
@sumeerbhola reviewed 11 of 11 files at r1.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @annrpom)
bloom/bloom.go line 167 at r1 (raw file):
// FilterSize returns the size in bytes of a bloom filter with the given number // of keys and bits per key. This can be used to estimate filter size before // building it.
nit: the code comment only mentions one return value.
sstable/properties.go line 209 at r1 (raw file):
// overhead (e.g., varint lengths, column headers). The key and value sizes // are the dominant factors, so this is typically close to the actual size. func (p *Properties) EstimatedSize(tblFormat TableFormat) uint64 {
isn't this estimation being called after adding each key-value pair during a compaction/flush? If yes, it needs to be very cheap, and accumulateProps doesn't seem cheap. One possibility would be to do this every 1000 keys or so, but we would need to measure the performance impact, in which case best to leave it out of this PR and do it separately.
I don't remember -- are the properties up to date while writing the sstable, or are some of them made up-to-date when finishing the sstable? If they are up-to-date, is the increase in size mainly because of the increasing size of the varints?
|
Can you provide some background? I don't immediately see the value in predicting the filter and prop blocks which should he small compared to the rest of the file. I think it's fine if the target file size doesn't cover them. BTW I am working on adaptive bloom filter which might make it harder to estimate the final size |
Yeah that's reasonable; I also didn't realize (at the time of putting up this PR) that we would call on estimate size for every key during a compaction. Instead of what we currently do for I'll leave this PR here for now, though (I'd also be down to close it) - since this change isn't imminently necessary; I just wanted to try taking a stab at a TODO I saw in passing |
This patch adds an estimation of the properties and filter blocks during the calculation of the estimated size of an sstable being written.