sstable: push filter check inside cache by RaduBerinde · Pull Request #5740 · cockroachdb/pebble

RaduBerinde · 2026-01-21T17:23:46Z

Previously, every bloom filter check in SeekPrefixGE required:

Getting the filter block from cache (atomic refcount increment)
Calling MayContain on the filter data (a very fast operation)
Releasing the block (atomic refcount decrement)
Updating hit counter (atomic increment)

For a simple bloom filter check (a few hash operations), the atomic
refcount operations represent significant overhead (especially when
there are a small number of very hot filters in the upper LSM levels).

Add a new cache.Handle.TableFilterMayContain method that performs
the filter check while holding the cache's read lock, without
incrementing/decrementing the refcount. The value is safe to access
while holding the read lock since eviction requires the write lock.

When the filter block is not in cache, the code falls back to the
existing readFilterBlock path which handles reading from disk and
populating the cache.

We no longer record these accesses as cache hits for statistics
purposes. We still record misses, and since we now separate statistics by
block type, we'll still know what percentage of overall misses are due
to filter blocks.

cockroach-teamcity · 2026-01-21T17:23:54Z

This change is

Previously, every bloom filter check in `SeekPrefixGE` required: 1. Getting the filter block from cache (atomic refcount increment) 2. Calling `MayContain` on the filter data (a very fast operation) 3. Releasing the block (atomic refcount decrement) 4. Updating hit counter (atomic increment) For a simple bloom filter check (a few hash operations), the atomic refcount operations represent significant overhead (especially when there are a small number of very hot filters in the upper LSM levels). Add a new `cache.Handle.TableFilterMayContain` method that performs the filter check while holding the cache's read lock, without incrementing/decrementing the refcount. The value is safe to access while holding the read lock since eviction requires the write lock. When the filter block is not in cache, the code falls back to the existing `readFilterBlock` path which handles reading from disk and populating the cache. We no longer record these accesses as cache hits for statistics purposes. We still record misses, and since we now separate statistics by block type (and level), we'll still know what percentage of overall misses are due to filter blocks.

petermattis

@petermattis made 1 comment.
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde and @sumeerbhola).

internal/cache/clockpro.go line 209 at r1 (raw file):

				e.referenced.Store(true)
			}
			return true, filter.MayContain(v.buf[dataOffset:], filterKey)

It's unfortunate that we're pushing knowledge of this MayContains interface into the cache. Is there a structure where we pass in a closure instead? Something like "fn func(data []byte)". The closure would have to capture the TableFilterDecoder and filterKey variables which would be problematic allocations. Perhaps there could be a small struct that contains the state needed by the closure and we can use a sync.Pool for that struct.

RaduBerinde · 2026-01-21T21:23:21Z

internal/cache/clockpro.go line 209 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It's unfortunate that we're pushing knowledge of this MayContains interface into the cache. Is there a structure where we pass in a closure instead? Something like "fn func(data []byte)". The closure would have to capture the TableFilterDecoder and filterKey variables which would be problematic allocations. Perhaps there could be a small struct that contains the state needed by the closure and we can use a sync.Pool for that struct.

I wouldn't want a pool operation for this.. I'll try to think of something (perhaps cache the callback in the Reader). But I don't want to encourage arbitrary function execution inside the lock either, so I'm not sure it's much better in terms of API. Another option would be an Acquire/Releaee type API, though that's prone to be abused too.

RaduBerinde

@RaduBerinde made 1 comment.
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @petermattis and @sumeerbhola).

internal/cache/clockpro.go line 209 at r1 (raw file):

Previously, RaduBerinde wrote…

I wouldn't want a pool operation for this.. I'll try to think of something (perhaps cache the callback in the Reader). But I don't want to encourage arbitrary function execution inside the lock either, so I'm not sure it's much better in terms of API. Another option would be an Acquire/Releaee type API, though that's prone to be abused too.

I will move this to draft and come back to it later. I think it can be done nicely with something along the lines of

func QuickAccess<Arg any, Res any> (c *Cache, fileNum base.DiskFileNum, offset uint64, fn func(Arg, []byte) Res) (_ Res, found bool)

We can pass whatever args we need to the function via Arg without having to capture them.

I found some other things along the way that I will work on at the same time:

filter metrics add another per-access atomic ops and they are not particularly useful as they are (I'd rather have sharded counters separated by level)
TableFilterDecoder could be a struct that has a mayContain func fields since the decoder is a stateless singleton
we could be initializing some state in the block metadata once instead of "decoding" stuff from the filter each time (though it's minor)
we could be using xsync.RBLock instead of RWMutex (which shards the reader counter)

This is from a benchmark I just wrote: it's a bit pessimistic as 8 workers all access the same filter, but it's pretty crazy how small the probe part actually is:

petermattis

@petermattis made 1 comment.
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde and @sumeerbhola).

internal/cache/clockpro.go line 209 at r1 (raw file):

Previously, RaduBerinde wrote…

I will move this to draft and come back to it later. I think it can be done nicely with something along the lines of
func QuickAccess<Arg any, Res any> (c *Cache, fileNum base.DiskFileNum, offset uint64, fn func(Arg, []byte) Res) (_ Res, found bool)
We can pass whatever args we need to the function via Arg without having to capture them.

I found some other things along the way that I will work on at the same time:

filter metrics add another per-access atomic ops and they are not particularly useful as they are (I'd rather have sharded counters separated by level)

TableFilterDecoder could be a struct that has a mayContain func fields since the decoder is a stateless singleton

we could be initializing some state in the block metadata once instead of "decoding" stuff from the filter each time (though it's minor)

we could be using xsync.RBLock instead of RWMutex (which shards the reader counter)

This is from a benchmark I just wrote: it's a bit pessimistic as 8 workers all access the same filter, but it's pretty crazy how small the probe part actually is:

Ack that using a sync.Pool may be problematic (too slow) for this usage. QuickAccess looks good.

Last year you speculated that xsync.RBMutex might be useful for contended blocks like the filter block or index block: #5029.

RaduBerinde · 2026-01-22T15:28:36Z

internal/cache/clockpro.go line 209 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ack that using a sync.Pool may be problematic (too slow) for this usage. QuickAccess looks good.

Last year you speculated that xsync.RBMutex might be useful for contended blocks like the filter block or index block: #5029.

Yeah, I did run some experiments with it a while back and I didn't notice an obvious gain (I think it was TPCC). It might bring in a more noticeable benefit once we remove the other overhead.

sumeerbhola

@sumeerbhola reviewed 5 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @petermattis).

internal/cache/clockpro.go line 209 at r1 (raw file):

Previously, RaduBerinde wrote…

Yeah, I did run some experiments with it a while back and I didn't notice an obvious gain (I think it was TPCC). It might bring in a more noticeable benefit once we remove the other overhead.

This profile was very revealing in terms of the cost of all the "not real work".
The generic function approach sounds good.

RaduBerinde requested review from petermattis and sumeerbhola January 21, 2026 17:23

RaduBerinde requested a review from a team as a code owner January 21, 2026 17:23

RaduBerinde force-pushed the may-contain-inside-lock branch from a5f2620 to 9a25ee7 Compare January 21, 2026 17:26

petermattis reviewed Jan 21, 2026

View reviewed changes

RaduBerinde commented Jan 22, 2026

View reviewed changes

RaduBerinde marked this pull request as draft January 22, 2026 02:52

petermattis reviewed Jan 22, 2026

View reviewed changes

sumeerbhola reviewed Jan 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sstable: push filter check inside cache#5740

sstable: push filter check inside cache#5740
RaduBerinde wants to merge 1 commit intocockroachdb:masterfrom
RaduBerinde:may-contain-inside-lock

RaduBerinde commented Jan 21, 2026 •

edited

Loading

Uh oh!

cockroach-teamcity commented Jan 21, 2026 •

edited

Loading

Uh oh!

petermattis left a comment

Uh oh!

RaduBerinde commented Jan 21, 2026

Uh oh!

RaduBerinde left a comment

Uh oh!

petermattis left a comment

Uh oh!

RaduBerinde commented Jan 22, 2026

Uh oh!

sumeerbhola left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RaduBerinde commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Jan 21, 2026

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Jan 22, 2026

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RaduBerinde commented Jan 21, 2026 •

edited

Loading

cockroach-teamcity commented Jan 21, 2026 •

edited

Loading