Skip to content

Instrument tracing spans for block processing and import #7816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Aug 8, 2025

Conversation

jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Jul 31, 2025

Issue Addressed

#7815

  • removes all existing spans, so some span fields that appear in logs like service_name may be lost.
  • instruments a few key code paths in the beacon node, starting from root spans named below:
  • Gossip block and blobs
    • process_gossip_data_column_sidecar
    • process_gossip_blob
    • process_gossip_block
  • Rpc block and blobs
    • process_rpc_block
    • process_rpc_blobs
    • process_rpc_custody_columns
  • Rpc blocks (range and backfill)
    • process_chain_segment
  • PendingComponents lifecycle
    • pending_components

To test locally:

Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg

Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
image

@jimmygchen jimmygchen added the work-in-progress PR is a work-in-progress label Jul 31, 2025
@jimmygchen
Copy link
Member Author

jimmygchen commented Aug 1, 2025

Just reading the doc on spans, I think some of the async functions are instrumented incorrectly, will revisit this on Monday!
https://docs.rs/tracing/latest/tracing/span/struct.Span.html

I've checked all enter() and entered usage and I believe they're good now.

Now seeing good traces in Grafana:
image

@pawanjay176
Copy link
Member

This is brilliant. I'm gonna play around with this later today

Copy link
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Just a few minor nits from me, LGTM

# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/fetch_blobs/mod.rs
#	beacon_node/lighthouse_network/src/service/mod.rs
Copy link

mergify bot commented Aug 7, 2025

Some required checks have failed. Could you please take a look @jimmygchen? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Aug 7, 2025
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Aug 8, 2025
@jimmygchen
Copy link
Member Author

@eserilev thanks for the review 🙏
I've got some more changes & refactor on another branch, but will merge this one and do a separate PR since this one is getting too big and already reviewed!

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Aug 8, 2025
mergify bot added a commit that referenced this pull request Aug 8, 2025
Copy link

mergify bot commented Aug 8, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You can check the last failing draft PR here: #7842.

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify bot added a commit that referenced this pull request Aug 8, 2025
Copy link

mergify bot commented Aug 8, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You can check the last failing draft PR here: #7843.

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@mergify mergify bot merged commit 40c2fd5 into sigp:unstable Aug 8, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants