-
Notifications
You must be signed in to change notification settings - Fork 1.4k
persist domain cache across domain_rotx #17362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
Main problem of D_LRU:
|
I agree. We do set lesser (1/10th) limit for code cache. Maybe this cache can exist along with a much smaller "value storing code cache". Dunno this needs more benchmarking. Code domain reads is the most time consuming.
SharedLRU should be nice here. Agreed that we would need some RPC latency/throughput measurements with this. I can check rpctest benches. It compares geth vs erigon, but maybe I can just take - geth vs erigon1 and geth vs erigon2 |
|
|
|
|
|
@sudeepdino008 FYI: QA team also have some rpc throughput monitoring/suite: https://monitoring.erigon.io/d/ddqiwbfvrgwlcd/erigonqa?orgId=1&from=now-30d&to=now&timezone=browser |
|
Also: |
cb70ebd to
cc5a420
Compare
|
the throughput actually gets better for QPS=10k (remains kind of the same for lower QPS) used for QPS=10k in this branch: in main:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review tomorrow
In full report i don’t see 100s in main |
oh my bad; had some bad flag in rpcdaemon - https://gist.github.com/sudeepdino008/525df078c2566765e08c6f6a94d65378 they both are are similar perf for 10k and 100k QPS |
|
@sudeepdino008 please disable http compression in rpcd - maybe it's a bottleneck now. |
added results with seems similar perf again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you have this cache in the right place. From the caller of the shared domains'perspective I think you want to hit the cache before you attempt to do a db then a file look-up.
I also think we want to be able to gather metrics at a domain level as to overall cache size and rate. Maybe we want different behavior for exec vs rpc. We actually don't know yet which is why we need graphana based metrics.
bear in mind from an exec time perspective we want to measure puts & gets from the application perspective. Although macro level cache stats are interesting they are not necessarily the most useful.
|
Also I think is we are going down this road we should be using: https://pkg.go.dev/weak It will help with memory management as it means the GC can be used to clean the cache. |
Such users working with biz-logic and real applications: Exec, RPC, P2P, etc... all this things need different caches (different queries). All this caches will suffer from Cache on immutable files: it's very different animal - it doesn't need invalidation (drop cache when new files available). Caches on top of Mdbx - is usually starvation - because MDBX (PageCache - is already an LRU). So, In my head:
|
|
several level of caching - is own evil for benchmarks reproducibility. but I'm interesting - what doors are opened for us after moving 95% of data to immutable files. |
application specific metrics: I like the idea, but the only way to do this seems to be via running different processes. Otherwise in a single process running both rpc and exec, for example - If we have a low level cache like file cache, then one app can impact the (common low-level) cache contents and the metrics aren't clean at that point. |
Does it means: "perf of 2 D_LRU implementations is similar" or "bottleneck is completely in another place in both cases"? |
This doesn't need several processed - it just needs the metics incorporated in the right place in the app - which I'm doing for execution, we should be doing the same metrics for the RPC layer. I think the point here is we need to see how each layer is impacted by the other. mI personally don't see a great deal of value in running individual benchmarks. they are useful for comparative purposes in development but don't give a good picture of the overall workload. I think your current benchmaking is a bit unrealistic. stage_exec does not really operate on its own, and what we really need to do is test the cache behaviour unser realistic load conditions. The typical number I see when testing for the db are 8-12 micro's. This is signifigantly more that you are seeing. What I'm saying is that I don't agree with your assertion that the db is fast, this is only the case on a well resourced machine. I really think we should be aiming for stable behaviour on loaded machines - which means avoid all page accessing layers - which is both db + files. I agree that the files are generally worse than the db - but both can be bad by an oreder of magnitide compared to memory access. |
I'm ok with a multy layer cache approach, but then I think that we need to provide access to the lower layer cache for status and metrics purposes - otherwise we either duplicate cached data or under use the high level cache. I think that the overall problem we have at the moment is that we tend to put all our optimization in one place - the DB layer. I think that this is too simplistic approach. I think we need to optimize in both directions. |
I was just eyeballing. Let me do a detailed post... RPC Throughput/Latency:
Mgas/sec: ===========================================
|
|
tnx for detailed analysis |
domain.visibleFiles; the same cache is shared across all rotxvisibleFileschangerecalcVisibleFilesis done, ongoingdomainRotxwill continue having the older cache, which is correct behavior.