Skip to content

Conversation

@jackkleeman
Copy link
Contributor

@jackkleeman jackkleeman commented Oct 24, 2025

Currently the default resources in the helm chart are really small, and more importantly, they are out of line with the default memory size.

My suggestion is that we increase the resources to match the default memory size of 6G - ie, 8G memory req/limit, 1:4 cpu to memory request ratio so 2 cpu - and also start specifying the memory size more explicitly so users can see that they need to override it.

This will be a breaking change for users that are not specifying their resources, and needs to be noted in the release notes for that reason. It is breaking in the sense that if you don't have enough capacity for the node any more, the replaced pod will not schedule. However this would be noticed immediately. In other cases it will still schedule, and performance will be improved, so there is only a cost impact, which I think is ok (the user can always specify the resources they want if needed).

However, it is NOT a breaking change for those who are specifying their resources but aren't specifying the rocks limit, which i think is probably very common. in those cases, the rocksdb memory size will stay 6G, but we will have the release notes and a logline encouraging them to consider changing it if it doesn't match their overrided resources.

Alternatives considered;

  • We could add the memory size env to match the current default resources, and not change those resources. This would potentially reduce the memory from 6G for any users that have not explicitly overriden this env, which will likely be a 'silent' performance degradation
  • We could just document that these should be kept in line, and hope users notice this in release notes etc. The logline in Warn if cgroup memory limit is misaligned with rocksdb memory limit #3925 might help, but this still feels like quite the footgun.
  • We could set some flag that instructs restate to automatically pick the memory limits to be 75% of the process limit. I'm inclined to avoid this kind of magic for the time being.

@github-actions
Copy link

github-actions bot commented Oct 24, 2025

Test Results

  7 files  ±0    7 suites  ±0   2m 34s ⏱️ +2s
 47 tests ±0   47 ✅ ±0  0 💤 ±0  0 ❌ ±0 
200 runs  ±0  200 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit f58b5d3. ± Comparison against base commit 5079b7e.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating our helm charts with the new default memory limits @jackkleeman. The changes look good to me. +1 for including it in the release notes as this change can be problematic for some users. +1 for merging.

Comment on lines +42 to +45
- name: RESTATE_ROCKSDB_TOTAL_MEMORY_SIZE
# This value should be around 75% of the container memory limit, which defaults to 8Gi below.
# If provisioning restate with a different memory limit, make sure to update this value
value: 6Gi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one alternative I was thinking of is whether we could use K8s downward API to automatically derive the value based on the configured memory limit (specific to the helm chart). But it would also add some form of auto magic (which probably should only happen if the rocksdb total memory hasn't been set explicitly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not easily... we can indeed get the memory request or limit as an env var, but it wont necessarily be a byte count, it would be something like '2Gi' which needs parsing and them multiplying by 0.75 or whatever. imo if we do this we should do it inside Restate, with an option to set memory to some ratio of the process memory limit (which we try to get from the cgroup)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i am largely against this kind of magic, best to solve this with clear documentation. imo, a docs page like 'these are the 4 parameters you should pick when building a new cluster' would be really helpful

@jackkleeman jackkleeman merged commit c0bfc0f into main Oct 27, 2025
26 checks passed
@jackkleeman jackkleeman deleted the jk/ktupvuowpvsl branch October 27, 2025 15:34
@github-actions github-actions bot locked and limited conversation to collaborators Oct 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants