Skip to content

core, eth, miner: add metrics to track block creation and write pipeline latency#2090

Merged
pratikspatil024 merged 4 commits intov2.6.2-candidatefrom
psp-metrics-for-stress-test
Feb 28, 2026
Merged

core, eth, miner: add metrics to track block creation and write pipeline latency#2090
pratikspatil024 merged 4 commits intov2.6.2-candidatefrom
psp-metrics-for-stress-test

Conversation

@pratikspatil024
Copy link
Member

Description

During stress testing, we observed span rotations caused by 10+ second delays between block sealing and broadcasting. The delay occurs in WriteBlockAndSetHead (called in resultLoop), which includes witness encoding, batch disk writes, and state commits, but none of these operations had metrics.

This PR adds timing metrics across the critical path in writeBlockWithState so we can immediately identify the bottleneck next time:

  • chain/witness/encode - witness RLP encoding
  • chain/witness/dbwrite - witness batch insertion
  • chain/witness/collection - trie node collection into witness during IntermediateRoot
  • chain/batch/write - blockBatch.Write() flush to disk (DB compaction stalls surface here)
  • chain/state/commit - CommitWithUpdate (pathdb diff layer flushes surface here)
  • worker/writeBlockAndSetHead - total WriteBlockAndSetHead duration
  • eth/seal2broadcast - latency from write completion to broadcast start

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 70.45455% with 13 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (v2.6.2-candidate@5c4029c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
core/blockchain.go 56.66% 9 Missing and 4 partials ⚠️

❌ Your patch status has failed because the patch coverage (70.45%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##             v2.6.2-candidate    #2090   +/-   ##
===================================================
  Coverage                    ?   49.89%           
===================================================
  Files                       ?      877           
  Lines                       ?   151448           
  Branches                    ?        0           
===================================================
  Hits                        ?    75565           
  Misses                      ?    70813           
  Partials                    ?     5070           
Files with missing lines Coverage Δ
core/state/statedb.go 66.74% <100.00%> (ø)
eth/handler.go 64.48% <100.00%> (ø)
miner/worker.go 67.56% <100.00%> (ø)
core/blockchain.go 62.05% <56.66%> (ø)
Files with missing lines Coverage Δ
core/state/statedb.go 66.74% <100.00%> (ø)
eth/handler.go 64.48% <100.00%> (ø)
miner/worker.go 67.56% <100.00%> (ø)
core/blockchain.go 62.05% <56.66%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@manav2401 manav2401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a small nit and a lint issue.

@pratikspatil024
Copy link
Member Author

@claude - review this PR

@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @pratikspatil024's task in 1m 21s —— View job


PR #2090 Review

Summary: This PR adds timing metrics to track the critical path between block sealing and broadcasting, which will help diagnose the 10+ second delays observed during stress testing that caused span rotations.


Overall Assessment: ✅ LGTM

This is a well-structured PR that adds observability to an important part of the block production pipeline. The metrics are placed at the right points in the code and follow existing patterns in the codebase.


Detailed Review

1. Metrics Definition (core/blockchain.go:130-139)

The new metrics are well-documented with clear comments explaining what each tracks:

  • chain/witness/encode - RLP encoding time
  • chain/witness/dbwrite - DB batch insertion time
  • chain/witness/collection - Trie node collection during IntermediateRoot
  • chain/batch/write - Batch flush to disk (DB compaction stalls)
  • chain/state/commit - CommitWithUpdate (pathdb diff layer flushes)

Using Timer type is appropriate here since these measure latency/duration.

2. Witness Timing in writeBlockWithState (core/blockchain.go:2268-2295)

The timing instrumentation correctly wraps:

  • EncodeRLP for witness encoding
  • WriteWitness for DB writes
  • blockBatch.Write() for disk flush
  • CommitWithUpdate for state commit

One minor note: The log.Debug("Writing witness", ...) is now inside the writeStart measurement which is fine since it's just a debug log.

3. WitnessCollection Timing (core/state/statedb.go:1334-1378, 1443-1450)

The timing correctly captures both:

  • The destructed objects loop that collects storage trie witnesses
  • The final account trie witness collection via s.trie.Witness()

The accumulation pattern (s.WitnessCollection += time.Since(witStart)) is consistent with other duration fields like AccountCommits, StorageCommits, etc.

4. NewMinedBlockEvent Enhancement (core/events.go:33-37)

Adding SealedAt time.Time to track when WriteBlockAndSetHead completed is a clean way to measure event delivery latency.

5. Seal-to-Broadcast Latency (eth/handler.go:70-74, 794-802)

The eth/seal2broadcast metric correctly:

  • Only updates when SealedAt is non-zero (guarded by !ev.SealedAt.IsZero())
  • Captures time from seal completion to broadcast start
  • Reuses now variable to avoid multiple time.Now() calls

6. WriteBlockAndSetHead Timer (miner/worker.go:108-111, 972-974)

The worker/writeBlockAndSetHead timer correctly wraps the entire WriteBlockAndSetHead call, complementing the internal sub-metrics.

7. Tests

Comprehensive unit tests added for each new metric:

  • TestWriteBlockMetrics - Verifies batch write and state commit timers
  • TestWitnessCollectionTiming - Verifies witness collection tracking with/without witness
  • TestSealToBroadcastTimer - Verifies seal-to-broadcast metric, including zero SealedAt case
  • TestWriteBlockAndSetHeadTimer - Verifies the worker timer during mining

Minor Suggestions (Non-blocking)

  1. Naming consistency: The metric eth/seal2broadcast uses a different naming style than chain/* and worker/*. Consider eth/seal/to/broadcast or worker/seal2broadcast for consistency, though this is a minor point.

  2. Test timing in TestWriteBlockAndSetHeadTimer: The 3-second sleep in miner/worker_test.go:1734 is relatively long. This should be acceptable for CI but could be flaky in resource-constrained environments.


Codecov Note

The 1 missing line in core/blockchain.go appears to be the witnessEncodeTimer.Update() call at line ~2275, which requires a non-nil witness to be executed. This is acceptable as the test path doesn't exercise witness encoding.


Verdict: The PR is ready to merge. The metrics are well-placed, properly documented, and the tests provide good coverage. This will provide valuable observability into block production latency.

witnessBytes := witBuf.Bytes()

writeStart := time.Now()
log.Debug("Writing witness", "block", block.NumberU64(), "hash", block.Hash(), "header", statedb.Witness().Header())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea is that we can add warning logs if the time of any operation is greater than a threshold, e.g. 100ms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here. Added for witness encode, witness write, batch flush and state commit

@sonarqubecloud
Copy link

Copy link
Contributor

@cffls cffls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pratikspatil024 pratikspatil024 merged commit ed2f01d into v2.6.2-candidate Feb 28, 2026
9 of 12 checks passed
@pratikspatil024 pratikspatil024 deleted the psp-metrics-for-stress-test branch February 28, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants