Blob p2p distribution using Reactor API by fridrik01 · Pull Request #2938 · berachain/beacon-kit

fridrik01 · 2025-09-22T17:09:11Z

issue: #2665
depends on cometbft blob PR: berachain/cometbft#26

Context
Currently, blobs are stored as as CometBFT transaction so a blob will never be cleaned up and will take up wasted space since they have a finite lifespan.

This PR addresses this by implementing a complete p2p blob distribution system using the cometbft Reactor API which fetches missing blobs from other peers. It introduces a BlobConsensusEnableHeight configuration parameter that defines when the chain transitions from storing blobs as transactions to using P2P distribution which requires a hard fork.

Components

BlobReactor (in da/blobreactor/) that handles p2p blob requests/responses with timeout handling and makes sure received blobs pass verification (from potential byzantine peers).
BlobFetcher background service (in beacon/blockchain/) that manages a persistent filesystem based retry queue. When a block arrives with missing blobs, the request is queued and the background worker attempts to fetch from peers at regular interval with a max retry count. The queue is crash-safe with atomic writes, includes request deduplication, and automatically cleans up requests that exceed retry limits or fall outside the DA availability window.

Consensus flow (after blob enable height reached)

PrepareProposal: Blobs included in the block are now returned in PrepareProposalResponse in a separate Blob field instead of as CometBFT transactions. This prevents blobs from being persisted in CometBFTs block store.
ProcessProposal: Blobs are now included in the ProcessProposalRequest in a separate Blob field and are validated synchronously. Proposals are rejected if blobs are missing or invalid. Validated blobs are cached with the state for FinalizeBlock.
FinalizeBlock: If node is in consensus mode, then blobs come from the ProcessProposal cache and are processed immediately. Otherwise, if the node is syncing (catching up) and comes upon a block that should have blobs the node will make async fetch request to BlobFetcher. FinalizeBlock returns immediately without blocking, allowing the chain to continue syncing while blobs are fetched in the background.

TODO:

Add peer scoring to avoid bad peers (implemented in see Blob p2p distribution using Reactor API #2938)
Evaluate if throttling is needed (implemented in see Blob p2p distribution using Reactor API #2938)
Update RPC endpoint to return "blobs pending" if user requests blobs for a block that are still queued in blobreactor
Update cometbft dependency
Evaluate if we need to split blobs into parts like cometbft does
Evaluate if we need to enable/disable Blobreactor when not needed

codecov · 2025-09-22T19:17:28Z

Codecov Report

❌ Patch coverage is 62.65173% with 400 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.24%. Comparing base (e293ced) to head (935e407).

Files with missing lines	Patch %	Lines
da/blobreactor/reactor.go	76.27%	74 Missing and 19 partials ⚠️
consensus/cometbft/service/finalize_block.go	13.55%	46 Missing and 5 partials ⚠️
beacon/blockchain/blob_queue.go	58.87%	31 Missing and 13 partials ⚠️
consensus/cometbft/service/encoding/encoding.go	18.91%	22 Missing and 8 partials ⚠️
beacon/blockchain/blob_fetcher.go	73.87%	26 Missing and 3 partials ⚠️
da/blobreactor/messages.go	64.00%	25 Missing and 2 partials ⚠️
beacon/blockchain/finalize_block.go	40.90%	21 Missing and 5 partials ⚠️
consensus/cometbft/service/process_proposal.go	8.33%	20 Missing and 2 partials ⚠️
consensus/cometbft/service/service.go	21.73%	17 Missing and 1 partial ⚠️
chain/spec.go	50.00%	7 Missing and 2 partials ⚠️
... and 13 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2938      +/-   ##
==========================================
+ Coverage   63.10%   63.24%   +0.13%     
==========================================
  Files         353      367      +14     
  Lines       17049    18014     +965     
==========================================
+ Hits        10759    11393     +634     
- Misses       5447     5713     +266     
- Partials      843      908      +65

Files with missing lines	Coverage Δ
beacon/blockchain/process_proposal.go	`69.02% <100.00%> (ø)`
beacon/blockchain/service.go	`86.84% <100.00%> (+2.46%)`	⬆️
...on/blockchain/testhelpers/blob_processor_simple.go	`100.00% <100.00%> (ø)`
config/config.go	`81.81% <100.00%> (+0.42%)`	⬆️
config/spec/devnet.go	`100.00% <100.00%> (ø)`
config/spec/mainnet.go	`100.00% <100.00%> (ø)`
config/spec/testnet.go	`100.00% <100.00%> (ø)`
consensus/cometbft/service/blobreactor/config.go	`100.00% <100.00%> (ø)`
consensus/cometbft/service/cache/cache.go	`84.21% <ø> (ø)`
da/blobreactor/config.go	`100.00% <100.00%> (ø)`
... and 26 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…s on startup

…t/integration tests

… fixes

…eers

abi87 · 2025-11-25T10:56:47Z

beacon/blockchain/blob_fetcher.go

+	return BlobFetcherConfig{
+		CheckInterval: 10 * time.Second,
+		RetryInterval: 1 * time.Minute,
+		MaxRetries:    10,


I wonder if there should be a cap here. What happens at cap? A node gives up on syncing?

If cap is reached, the blob request is deleted from disk, meaning that blob won't exist on this node. The node always continues syncing regardless and other blob requests are processed independently. If/when this happens, we record a blob_fetcher_requests_expired_total{reason="max_retries"}) metric so operators can detect if this happens frequently. Alternatively we can keep this uncapped and instead do something like double retryInterval (exponential backoff) each time it fails. Eventually it will pass the DA window be cleaned up.

abi87 · 2025-11-25T10:57:31Z

beacon/blockchain/blob_fetcher.go

+	bf.ctx, bf.cancel = context.WithCancel(ctx)
+	go bf.run()


unlike Stop, this is not idempotent. Should it be?

Probably not, kept it simple since depositCatchupFetcher isn't idempotent either and is called in the same Service.Start() method (https://github.com/berachain/beacon-kit/blob/blobreactor/beacon/blockchain/service.go#L113-L116). Happy to add sync.Once to both if we think it's worth it.

abi87

a few questions but overall LGTM

fridrik01 self-assigned this Sep 22, 2025

fridrik01 had a problem deploying to test-e2e September 22, 2025 17:09 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 22, 2025 17:10 — with GitHub Actions Failure

fridrik01 force-pushed the blobreactor branch from 5e2e84d to fce3562 Compare September 22, 2025 19:07

fridrik01 had a problem deploying to test-e2e September 22, 2025 19:08 — with GitHub Actions Error

fridrik01 force-pushed the blobreactor branch from fce3562 to 8e3f94c Compare September 22, 2025 20:32

fridrik01 had a problem deploying to test-e2e September 22, 2025 20:32 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 23, 2025 11:06 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 23, 2025 11:06 — with GitHub Actions Failure

fridrik01 had a problem deploying to test-e2e September 23, 2025 13:05 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 23, 2025 13:05 — with GitHub Actions Failure

fridrik01 had a problem deploying to test-e2e September 23, 2025 16:23 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 23, 2025 16:23 — with GitHub Actions Failure

fridrik01 had a problem deploying to test-e2e September 24, 2025 14:22 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 25, 2025 12:36 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 25, 2025 15:50 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 25, 2025 15:51 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 26, 2025 16:30 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 27, 2025 11:53 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 27, 2025 12:02 — with GitHub Actions Error

fridrik01 force-pushed the blobreactor branch from eac91a1 to e07b7d9 Compare September 27, 2025 12:55

fridrik01 temporarily deployed to test-e2e September 27, 2025 12:55 — with GitHub Actions Inactive

fridrik01 temporarily deployed to test-e2e September 27, 2025 13:12 — with GitHub Actions Inactive

fridrik01 had a problem deploying to test-e2e September 27, 2025 13:58 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 27, 2025 15:26 — with GitHub Actions Error

fridrik01 had a problem deploying to test-e2e September 27, 2025 15:27 — with GitHub Actions Error

fridrik01 and others added 27 commits October 31, 2025 15:14

fix lint

242ce3e

fix: panic on multiple calls to blobFetcher Close()

25236e8

fix: use atomic file write to prevent reading partial json files

b595ba8

Use math.Slot

65b385d

Add blob download retry logic, other refactorings

1cafeb4

Move blob queue to blobs/download_queue and cleanup orphaned tmp file…

1407247

…s on startup

Add constants for timeouts

7239484

DELETE: Simulate blob fetching failures to test retries

274c0a7

DELETE: simulate failures

6ad6706

refactor and split up blob fetcher into queue/executor + add test suite

5813e71

fix lint errors

e8d4f80

Make blob fetcher timeouts configurable

7bdc0d8

Add integration test for blob_fetcher, refactor helpers shared by uni…

62c46f8

…t/integration tests

fix: corrupt json files wont stall processing + refactoring and minor…

3209a7a

… fixes

fix: cleanup responseChans

672086d

improve ssz handling + remove error from response

c0a1d2b

add todo

d2947fd

Add blobreactor/fetcher metrics

eaf51e3

DELETE: testing error metrics in devnet

3508ac0

Revert the testing for failed blobs + don't log queue when 0)

0478777

Add kurtosis e2e test for syncing node requiring fetching blob from p…

d9c673c

…eers

Add IsBlobConsensusEnabledAtHeight helper to reduce code duplication

bbb7799

Use default blobreactor config if missing

fe2d125

Address review comments

a6e214f

Check for pending blobs in GetBlobSidecars Api endpoint

0ef6906

fix: address lint errors after rebase

cc4ec8a

Merge branch 'main' into blobreactor

935e407

abi87 reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blob p2p distribution using Reactor API#2938

Blob p2p distribution using Reactor API#2938
fridrik01 wants to merge 38 commits intomainfrom
blobreactor

fridrik01 commented Sep 22, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

abi87 Nov 25, 2025

Uh oh!

fridrik01 Nov 27, 2025

Uh oh!

abi87 Nov 25, 2025

Uh oh!

fridrik01 Nov 27, 2025

Uh oh!

abi87 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fridrik01 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

abi87 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

fridrik01 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

abi87 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

fridrik01 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

abi87 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fridrik01 commented Sep 22, 2025 •

edited

Loading

codecov bot commented Sep 22, 2025 •

edited

Loading