Skip to content

RFC: Key-Value Separation for Mini-LSM#169

Closed
ben1009 wants to merge 98 commits intoskyzh:mainfrom
ben1009:rfc/key-value-separation
Closed

RFC: Key-Value Separation for Mini-LSM#169
ben1009 wants to merge 98 commits intoskyzh:mainfrom
ben1009:rfc/key-value-separation

Conversation

@ben1009
Copy link
Copy Markdown
Contributor

@ben1009 ben1009 commented Mar 8, 2026

Summary

This RFC proposes implementing WiscKey-style key-value separation for Mini-LSM, inspired by production systems like BadgerDB, RocksDB's BlobDB, and Titan.

Motivation

The current Mini-LSM architecture stores keys and values together in SSTable blocks, which leads to:

  • High write amplification during compaction (10x for large values)
  • Inefficient range scans
  • Block cache pollution

Key Benefits

  • 5-10x reduction in write amplification for large-value workloads
  • Faster range scans (no need to scan through large values)
  • Better cache efficiency (keys separated from values)

Design Highlights

  • ValuePointer: 16-byte reference to vLog entries
  • ValueLog: Dedicated files for large values with configurable threshold
  • Garbage Collection: Automatic space reclamation during compaction
  • Backward Compatible: Can be disabled, existing data unchanged

Implementation Plan

4-phase approach over ~4 weeks:

  1. Core Infrastructure (ValuePointer, vLog format, read/write)
  2. SSTable Integration
  3. Garbage Collection
  4. Testing & Optimization

See the full RFC in for detailed design, API changes, and references.


Related Work:

cc: @skyzh

ben1009 and others added 30 commits February 6, 2024 12:11
* chore: typos

* merge starter code
* refactor
* w1d4

* w1d4 refactor
* w1d5 fix flaky test
* w1d6
* chore: typos & refine comments (#65)

* typo in  week2-01-compaction.md

* chroe: typos & add comments

* chore: more typos

* Update week2-01-compaction.md

---------

Co-authored-by: Alex Chi Z <iskyzh@gmail.com>

* Fix typos in W3D5 writeup and code (#67)

* Fix minor mistake in W3D6 writeup (#69)

---------

Co-authored-by: Alex Chi Z <iskyzh@gmail.com>
Co-authored-by: Yue Yin <41224888+yyin-dev@users.noreply.github.com>
ben1009 and others added 28 commits July 16, 2024 16:04
* chore: fix lint
* chore: optimize overlap_len
* chore: add comment about blockcache

* chore: fix ci
Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.14 to 0.5.15.
- [Release notes](https://github.com/crossbeam-rs/crossbeam/releases)
- [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md)
- [Commits](crossbeam-rs/crossbeam@crossbeam-channel-0.5.14...crossbeam-channel-0.5.15)

---
updated-dependencies:
- dependency-name: crossbeam-channel
  dependency-version: 0.5.15
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
…l-0.5.15

chore(deps): bump crossbeam-channel from 0.5.14 to 0.5.15
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Reverts all changes to mini-lsm-starter from PR #69 to keep starter code clean for course participants.
Added explicit lifetime annotation to KeySlice return type
deps: sync with upstream skyzh/mini-lsm and restore starter templates
deps: bump bytes from 1.10.1 to 1.11.1
This RFC proposes implementing WiscKey-style key-value separation for Mini-LSM,
which stores large values in dedicated Value Log (vLog) files while keeping keys
and value pointers in the LSM tree.

Key benefits:
- 5-10x reduction in write amplification for large-value workloads
- Faster range scans (no need to scan through large values)
- Better block cache efficiency

The RFC includes:
- Detailed architecture design with ValuePointer, ValueLog, and GC components
- File format specifications for vLog entries
- 4-phase implementation plan
- Testing strategy and compatibility considerations
- References to WiscKey, BadgerDB, RocksDB BlobDB, and Titan
@ben1009
Copy link
Copy Markdown
Contributor Author

ben1009 commented Mar 8, 2026

Closing - opened to wrong repository. The correct PR is at ben1009#71

@ben1009 ben1009 closed this Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant