Skip to content

reth static files consolidation causes execution backlog and op-node payload queue overflow #759

@Arisawa-v

Description

@Arisawa-v

Hi, we’re running OP Stack with reth as execution client (dockerized), and are consistently hitting the following op-node error:

Dropping payload from payload queue because the payload queue is too large

This happens simultaneously on two machines (one full SSD, one SSD+HDD), so it does not appear to be disk-type specific.

L1 RPC is shared with other nodes and confirmed healthy.


Setup

  • OP Stack
  • Execution: reth (docker)
  • op-node and reth in separate containers on same host
  • Metrics exposed from reth on host port 7301

Observed behavior

During the incident:

  • CPU is fully saturated
  • op-node continuously drops payloads
  • Execution falls behind derive
  • Both machines enter this state at roughly the same time (similar chain height)

reth metrics show heavy static files activity:

reth_static_files_jar_provider_calls_total{segment="receipts",operation="append"} 16397259
reth_static_files_jar_provider_calls_total{segment="transactions",operation="append"} 16397259
reth_static_files_jar_provider_calls_total{segment="headers",operation="append"} 76684

reth_static_files_jar_provider_calls_total{segment="transactions",operation="commit-writer"} 22106
reth_static_files_jar_provider_calls_total{segment="receipts",operation="commit-writer"} 22108
reth_static_files_jar_provider_calls_total{segment="headers",operation="commit-writer"} 22098

reth_consensus_engine_beacon_pipeline_runs 4

This strongly suggests reth is in a static files ingestion / consolidation phase (receipts / tx / headers + commit-writer flush).

During this phase, execution ingest throughput drops significantly, while op-node continues deriving payloads, eventually overflowing the payload queue and dropping payloads.


Conclusion

This appears to be an interaction between:

  • reth static files consolidation / pipeline stages
  • OP Stack’s continuous payload derivation

When reth enters static files stages, it temporarily cannot keep up with real-time execution payloads, but op-node has no feedback mechanism and continues pushing, leading to queue overflow.

This reproduces even on full SSD systems, so it seems inherent to the pipeline behavior rather than raw disk speed.


Questions

  1. Is this expected behavior when static files are enabled?
  2. How to solve the problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions