-
Notifications
You must be signed in to change notification settings - Fork 904
Use scoped rayon pool for backfill chain segment processing #7924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Use scoped rayon pool for backfill chain segment processing #7924
Conversation
0e9d888
to
47a80e5
Compare
…ckfill-verify-kzg-use-scoped-rayon
…ckfill-verify-kzg-use-scoped-rayon
I've done a bunch of testing on backfill with the global rayon pool vs scoped rayon pool usage. The biggest difference is that KZG verification takes more than double the time with a scoped rayon pool (using a max of~25% of cpu threads) vs the global rayon pool. Additionally I did see lower cpu usage on average in the scoped pool case. My node did not have any issues following the chain in either case during backfill. So I can't argue that this change is absolutely necessary. But it is a safe and relatively simple optimization to make. It can potentially help protect nodes from having issues during backfill sync, so maybe thats a good enough reason to include this in our 8.0.0 release candidate I haven't see any evidence that something like #7789 is required. It seems like the OS scheduler is good enough at figuring things out with the current scoped rayon pool usage. If we were to expand our scoped rayon pool usage to other work events, #7789 could potentially become more relevant. In a future iteration (or in this PR) we could make the low priority rayon pool configurable via cli flag to give users additional control over the speed of backfill. This is probably unnecessarily at the moment, but could potentially become useful if we expand scoped rayon thread pool usage. Another TODO could be to introduce a high priority rayon pool and always avoid scheduling rayon tasks on the global thread pool. It remains to be seen if that would be useful considering our current rayon usage. |
Some required checks have failed. Could you please take a look @jimmygchen? 🙏 |
Issue Addressed
Part of #7866
rayon
to speed up batch KZG verification #7921In the above PR, we enabled rayon for batch KZG verification in chain segment processing. However, using the global rayon thread pool for backfill is likely to create resource contention with higher-priority beacon processor work.
Proposed Changes
This PR introduces a dedicated low-priority rayon thread pool
LOW_PRIORITY_RAYON_POOL
and uses it for processing backfill chain segments.This prevents backfill KZG verification from using the global rayon thread pool and competing with high-priority beacon processor tasks for CPU resources.
However, this PR by itself doesn't prevent CPU oversubscription because other tasks could still fill up the global rayon thread pool, and having an extra thread pool could make things worse. To address this we need the beacon
processor to coordinate total CPU allocation across all tasks, which is covered in: