Skip to content

Conversation

athei
Copy link
Member

@athei athei commented Jul 18, 2025

This PR changes the contract code limit from roughly 100KiB to exactly 1MiB. It also raises the call stack depth from 5 to 25.

Those limits were in place because of memory constraints within the runtime. We work around them in those ways:

  1. Removing the 4x safety margin for allocations which is no longer needed due to the new allocator.
  2. Limiting the size of the compilation cache to a fixed size.
  3. Resetting the compilation cache and flat map every time we call into a new contract.
  4. Limiting the size of calldata and return data to 128KiB (only capped by tx size and RAM before). While this is a breaking change nobody will be affected since Geth effectively limits the call data to 128KiB.

1MiB contracts

This is large enough so that all known contracts won't fail for size issues anymore.

The new limit is also much simpler to understand since it does not depend on the number of instructions. Just those two constraints:

PVM_BLOB.len() < 1 MiB
PVM_BLOB.len() + (rw/ro/stack) < 1MiB + 512KiB

This means:

  1. A contract is guaranteed to have at least 512KiB of memory available.
  2. A contract that is smaller in code can use more memory.
  3. Limit is exactly 1MiB unless a user manually increase the memory usage of a contract to be larger than 512KiB.

Call stack depth 5 -> 25

The limit of 5 was problematic because there are use cases which require deeper stacks. With the raise to 25 there should be no benign use cases anymore that won't work.

Please note that even with the low limit of 25 contracts are not vulnerable to stack depth exhaustion attacks: We do trap the caller's context when the depth limit is reached. This is different from Eth where this error can be handled and failure to do so leaves the contract vulnerable.

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/16373373170
Failed job name: build-rustdoc

@athei
Copy link
Member Author

athei commented Jul 20, 2025

/cmd bench --runtime dev --pallet pallet_revive

Copy link
Contributor

Command "bench --runtime dev --pallet pallet_revive" has started 🚀 See logs here

@athei athei requested review from xermicus, pgherveou and koute and removed request for xermicus July 20, 2025 07:44
Copy link
Contributor

Command "bench --runtime dev --pallet pallet_revive" has finished ✅ See logs here

Subweight results:
File Extrinsic Old New Change [%]
substrate/frame/revive/src/weights.rs instr 1.20ms 1.64ms +36.95
substrate/frame/revive/src/weights.rs identity 117.21us 158.45us +35.18
substrate/frame/revive/src/weights.rs seal_call_data_copy 119.70us 158.59us +32.49
substrate/frame/revive/src/weights.rs seal_ref_time_left 227.00ns 289.00ns +27.31
substrate/frame/revive/src/weights.rs seal_block_number 246.00ns 306.00ns +24.39
substrate/frame/revive/src/weights.rs seal_copy_to_contract 210.35us 252.04us +19.82
substrate/frame/revive/src/weights.rs seal_return 53.10us 63.06us +18.75
substrate/frame/revive/src/weights.rs seal_return_data_size 253.00ns 297.00ns +17.39
substrate/frame/revive/src/weights.rs seal_value_transferred 273.00ns 320.00ns +17.22
substrate/frame/revive/src/weights.rs seal_call_data_size 248.00ns 288.00ns +16.13
substrate/frame/revive/src/weights.rs seal_call_data_load 250.00ns 285.00ns +14.00
substrate/frame/revive/src/weights.rs seal_own_code_hash 279.00ns 318.00ns +13.98
substrate/frame/revive/src/weights.rs seal_address 290.00ns 329.00ns +13.45
substrate/frame/revive/src/weights.rs seal_minimum_balance 264.00ns 298.00ns +12.88
substrate/frame/revive/src/weights.rs seal_base_fee 256.00ns 284.00ns +10.94
substrate/frame/revive/src/weights.rs instantiate_with_code 4.11ms 4.55ms +10.61
substrate/frame/revive/src/weights.rs call_with_code_per_byte 641.99us 693.26us +7.99
substrate/frame/revive/src/weights.rs seal_caller 330.00ns 355.00ns +7.58
substrate/frame/revive/src/weights.rs seal_take_transient_storage 2.66us 2.85us +7.50
substrate/frame/revive/src/weights.rs seal_call_precompile 256.36us 274.85us +7.21
substrate/frame/revive/src/weights.rs seal_caller_is_origin 331.00ns 352.00ns +6.34
substrate/frame/revive/src/weights.rs seal_contains_transient_storage 1.86us 1.98us +6.27
substrate/frame/revive/src/weights.rs seal_set_transient_storage 2.59us 2.74us +5.88
substrate/frame/revive/src/weights.rs seal_gas_price 258.00ns 273.00ns +5.81
substrate/frame/revive/src/weights.rs seal_clear_transient_storage 2.43us 2.57us +5.74
substrate/frame/revive/src/weights.rs seal_weight_left 668.00ns 702.00ns +5.09
substrate/frame/revive/src/weights.rs rollback_transient_storage 1.15us 1.21us +5.04
substrate/frame/revive/src/weights.rs seal_get_transient_storage 2.11us 2.22us +5.03
substrate/frame/revive/src/weights.rs seal_balance 5.28us 4.68us -11.22
Command output:

✅ Successful benchmarks of runtimes/pallets:
-- dev: ['pallet_revive']

/// Maximum size of events (including topics) and storage values.
pub const PAYLOAD_BYTES: u32 = 416;

/// The maximum size for calldata and return data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this value matches the current practical limit on Ethereum. Might point that out here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added it.

@athei
Copy link
Member Author

athei commented Aug 1, 2025

Resolved merge conflicts and updates to the latest version of paritytech/polkavm#316.

We will make use of PolkaVMs memory estimation now instead of baking in assumptions into the pallet.

Next up: Will try to attack this: Creating a contract that tries to maximize memory usage and call it recursively. Cannot be written as a unit test as those run inside the native runtime.

@athei
Copy link
Member Author

athei commented Aug 1, 2025

/cmd fmt

@athei athei changed the title WIP: pallet-revive: Raise contract size limit to one megabyte and raise call depth to 25 pallet-revive: Raise contract size limit to one megabyte and raise call depth to 25 Aug 5, 2025
@iulianbarbu
Copy link
Contributor

Hey @athei ! I have a few planning question related to this work.

  1. My tests have shown that the host/local allocators memory usage for https://github.com/paritytech/memory_exhaustion contract is around (~55MB peak requested space and ~60MB physical memory used). You can see more results at the end of the document here: https://hackmd.io/IY1K3WSjSM63u4jqZy-nng. I consider them promising towards this PR checkpoint:

Either wait for #8992 to be merged or be sure that we don't a large safety margin for the current one

If it comes to local allocator usage we got ~55MB peak requested space and ~56MB worth of physical memory used, so slightly better than the host allocator. However, host allocator is doing a good job too. I would not consider them in the danger zone for this heavy contract, and I would say things look good either way, and the above checkbox can be checked if you feel this testing scenario suffices for now.

  1. We discussed this in the past and as you said, it is not ok enabling a target on debug for node logs, since all other targets will be evaluated against debug and things are slowed down. Moreover, we end up in an indefinite logging spree from host_allocator allocations/deallocations. Can't explain why but I let an exercise running for more than 24 hours and the extrinsic associated to the contract call from the memory_exhaustion contract did not end. I have results for such an exercise in the linked document. I thought the runtime should get an OOM, but the node did not panic/get stuck, and allocations were continuously logged during the exercise if not stopped. The peak requested usage of the entire allocation/deallocation sequence based on the simulation for both allocators was >300MB, which is more than the runtime memory, set to 128MB.

  2. When it comes to the local allocator testing, there is one more thing I want to do, which is to run an AHP full node with the runtime based on the local allocator to check against unexpected errors (as a smoke test for an existing production network, with the local allocator). If host allocator can be used for current contracts deployment & execution (given the limits in this PR) I think there is not a significant motivation to merge local allocator for now, unless there is planned smart contracts work where local allocator still makes sense (of course, and it is found out based on the simulation that host allocator is inefficient and causes problems).

LMK your thoughts.

@athei
Copy link
Member Author

athei commented Aug 9, 2025

If it comes to local allocator usage we got ~55MB peak requested space and ~56MB worth of physical memory used, so slightly better than the host allocator. However, host allocator is doing a good job too. I would not consider them in the danger zone for this heavy contract, and I would say things look good either way, and the above checkbox can be checked if you feel this testing scenario suffices for now.

I agree. The wasted space on padding is < 3MiB. This was what I was looking for. We can go ahead with this PR without the new allocator. The 56MiB is also lining up with the worst case calculations we do in pallet_revive::integrity_check. We assume we can consume half of the runtime memory.

2. Can't explain why but I let an exercise running for more than 24 hours and the extrinsic associated to the contract call from the memory_exhaustion contract did not end.

If you are logging from the allocator you might just built an infinite loop. Because the logging also allocates.

2. We discussed this in the past and as you said, it is not ok enabling a target on debug for node logs, since all other targets will be evaluated against debug and things are slowed down.

You should do all your experiments with the default log level. This is because the logging does allocate and it is not representative of the actual memory allocations. AFAIK critical nodes are run without any logging just to be safe.

A bit more context: When you log from the runtime the filtering is done at the host. The runtime just requests the maximum log level independent of the target (it uses whatever the host uses). This means turning on debug for any target will make the runtime emit all the debug logs for all the targets. Even though the host will discard most of them you still caused a shit ton of logs (and hence allocations) inside the runtime. This is especially bad for contracts since the PolkaVM debug logs are very verbose.

@athei
Copy link
Member Author

athei commented Aug 11, 2025

/cmd prdoc --audience runtime_dev --bump patch

@athei athei added the T7-smart_contracts This PR/Issue is related to smart contracts. label Aug 11, 2025
@athei
Copy link
Member Author

athei commented Aug 11, 2025

/cmd bench --runtime dev --pallet pallet_revive

Copy link
Contributor

Command "bench --runtime dev --pallet pallet_revive" has started 🚀 See logs here

Copy link
Contributor

Command "bench --runtime dev --pallet pallet_revive" has finished ✅ See logs here

Subweight results:
File Extrinsic Old New Change [%]
substrate/frame/revive/src/weights.rs seal_now 259.00ns 337.00ns +30.12
substrate/frame/revive/src/weights.rs seal_block_number 281.00ns 358.00ns +27.40
substrate/frame/revive/src/weights.rs blake2f 28.45us 36.03us +26.65
substrate/frame/revive/src/weights.rs seal_origin 305.00ns 386.00ns +26.56
substrate/frame/revive/src/weights.rs seal_base_fee 283.00ns 347.00ns +22.61
substrate/frame/revive/src/weights.rs seal_ref_time_left 269.00ns 325.00ns +20.82
substrate/frame/revive/src/weights.rs rollback_transient_storage 1.16us 1.35us +16.84
substrate/frame/revive/src/weights.rs seal_caller 338.00ns 390.00ns +15.38
substrate/frame/revive/src/weights.rs seal_balance 12.07us 13.88us +15.00
substrate/frame/revive/src/weights.rs seal_contains_transient_storage 1.94us 2.21us +13.93
substrate/frame/revive/src/weights.rs get_transient_storage_full 1.72us 1.95us +13.76
substrate/frame/revive/src/weights.rs seal_value_transferred 291.00ns 331.00ns +13.75
substrate/frame/revive/src/weights.rs seal_clear_transient_storage 2.42us 2.74us +13.18
substrate/frame/revive/src/weights.rs seal_caller_is_root 289.00ns 327.00ns +13.15
substrate/frame/revive/src/weights.rs seal_take_transient_storage 2.66us 3.00us +13.13
substrate/frame/revive/src/weights.rs seal_gas_limit 454.00ns 509.00ns +12.11
substrate/frame/revive/src/weights.rs seal_set_transient_storage 2.61us 2.90us +11.00
substrate/frame/revive/src/weights.rs get_transient_storage_empty 1.55us 1.72us +10.89
substrate/frame/revive/src/weights.rs seal_get_transient_storage 2.24us 2.47us +10.57
substrate/frame/revive/src/weights.rs seal_gas_price 288.00ns 317.00ns +10.07
substrate/frame/revive/src/weights.rs seal_return_data_size 298.00ns 326.00ns +9.40
substrate/frame/revive/src/weights.rs seal_own_code_hash 308.00ns 336.00ns +9.09
substrate/frame/revive/src/weights.rs seal_call_data_size 287.00ns 313.00ns +9.06
substrate/frame/revive/src/weights.rs set_transient_storage_full 1.95us 2.13us +8.96
substrate/frame/revive/src/weights.rs seal_minimum_balance 314.00ns 339.00ns +7.96
substrate/frame/revive/src/weights.rs seal_address 318.00ns 343.00ns +7.86
substrate/frame/revive/src/weights.rs set_transient_storage_empty 1.58us 1.69us +7.23
substrate/frame/revive/src/weights.rs seal_block_author 45.67us 48.51us +6.22
substrate/frame/revive/src/weights.rs seal_deposit_event 5.53us 5.82us +5.17
substrate/frame/revive/src/weights.rs ripemd_160 4.13ms 3.91ms -5.30
substrate/frame/revive/src/weights.rs bn128_add 16.97us 15.94us -6.05
substrate/frame/revive/src/weights.rs seal_call_precompile 278.51us 261.11us -6.25
substrate/frame/revive/src/weights.rs hash_blake2_256 1.64ms 1.48ms -9.69
substrate/frame/revive/src/weights.rs seal_copy_to_contract 249.15us 213.32us -14.38
substrate/frame/revive/src/weights.rs seal_return 62.99us 53.18us -15.58
substrate/frame/revive/src/weights.rs identity 155.02us 118.31us -23.68
substrate/frame/revive/src/weights.rs seal_call_data_copy 157.52us 118.25us -24.93
substrate/frame/revive/src/weights.rs instr 1.53ms 1.11ms -27.80
Command output:

✅ Successful benchmarks of runtimes/pallets:
-- dev: ['pallet_revive']

Copy link
Contributor

@pgherveou pgherveou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just one question

fn call_with_code_per_byte(
c: Linear<0, { limits::code::STATIC_MEMORY_BYTES / limits::code::BYTES_PER_INSTRUCTION }>,
) -> Result<(), BenchmarkError> {
fn call_with_code_per_byte(c: Linear<0, { 100 * 1024 }>) -> Result<(), BenchmarkError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 100 * 1024 and not BLOB_BYTES?

Copy link
Member Author

@athei athei Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets really slow. And since it scales linearly we don't need to test with the biggest blob. Slows down running tests.

@athei athei added this pull request to the merge queue Aug 14, 2025
Merged via the queue into master with commit ba0f5b0 Aug 14, 2025
239 of 241 checks passed
@athei athei deleted the at/sizes branch August 14, 2025 09:11
athei added a commit that referenced this pull request Aug 14, 2025
…ll depth to 25 (#9267)

This PR changes the contract code limit from roughly 100KiB to exactly
1MiB. It also raises the call stack depth from 5 to 25.

Those limits were in place because of memory constraints within the
runtime. We work around them in those ways:
1) Removing the 4x safety margin for allocations which is no longer
needed due to the new allocator.
2) Limiting the size of the compilation cache to a fixed size.
3) Resetting the compilation cache and flat map every time we call into
a new contract.
4) Limiting the size of calldata and return data to 128KiB (only capped
by tx size and RAM before). While this is a breaking change nobody will
be affected since Geth effectively limits the call data to 128KiB.

This is large enough so that all known contracts won't fail for size
issues anymore.

The new limit is also much simpler to understand since it does not
depend on the number of instructions. Just those two constraints:
```
PVM_BLOB.len() < 1 MiB
PVM_BLOB.len() + (rw/ro/stack) < 1MiB + 512KiB
```

This means:
1) A contract is guaranteed to have at least 512KiB of memory available.
2) A contract that is smaller in code can use more memory.
3) Limit is exactly 1MiB unless a user manually increase the memory
usage of a contract to be larger than 512KiB.

The limit of `5` was problematic because there are use cases which
require deeper stacks. With the raise to `25` there should be no benign
use cases anymore that won't work.

Please note that even with the low limit of `25` contracts are not
vulnerable to stack depth exhaustion attacks: We do trap the caller's
context when the depth limit is reached. This is different from Eth
where this error can be handled and failure to do so leaves the contract
vulnerable.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@athei athei mentioned this pull request Aug 14, 2025
athei added a commit that referenced this pull request Aug 15, 2025
- #9112
- #9101
- #9416
- #9357
- #9441
- #9267

Those are all the changes we want to get onto the next Kusama release.
The new gas mapping and EVM backend will not make it.

---------

Co-authored-by: PG Herveou <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Aguirre <[email protected]>
Co-authored-by: Oliver Tale-Yazdi <[email protected]>
Co-authored-by: sekiseki <[email protected]>
Co-authored-by: Michael Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T7-smart_contracts This PR/Issue is related to smart contracts.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants