feat: gap detection & rewind #38

sergeytimoshin · 2025-07-31T00:11:59Z

Gap Detection & Rewind: A comprehensive system has been implemented to detect sequence gaps across all state update types (V1/V2 trees, nullifications, batch events, etc.). When a gap is detected, the indexer now automatically triggers a "rewind" command to the block fetcher, which repositions the stream to re-fetch the missing blocks, ensuring no data is missed.
Tree Filtering: The indexer can now be started with a --tree argument. When provided, it will filter and process only the blocks and transactions relevant to the specified Merkle tree. This speeds up indexing, when backfilling a specific tree.
Snapshot Analysis Tool: A new photon-analyze-snapshot tool added to provide info about snapshot files.

Key changes:

Gap Detection & Rewind Controller (src/ingester/detect_gaps.rs, src/ingester/rewind_controller.rs):
- Introduced a global state to track the latest sequence numbers for all trees and operation types.
- Implemented logic to extract sequence numbers from every StateUpdate and compare them against the last known sequence to detect gaps.
- Created a RewindController that sends a command to the block fetcher to rewind to a specific slot when a gap is found.
Block Fetcher Integration (src/ingester/fetchers/):
- The gRPC and Poller block fetchers now listen for rewind commands.
- Upon receiving a command, the fetcher clears its internal cache and restarts the block stream from the requested slot.
Tree Filtering (src/ingester/, src/main.rs):
- Added a --tree CLI argument to the main binary.
- The ingester now performs a quick check to see if a block contains any instructions interacting with the filtered tree.
- If a block is irrelevant to the filtered tree, it is skipped entirely, avoiding unnecessary parsing and database writes.
Unified Sequence Tracking (src/ingester/persist/persisted_indexed_merkle_tree.rs):
- Address Tree (V1) updates are now also persisted to the state_tree_histories table.
- This unifies the sequence number tracking for both State and Address trees into a single table, which simplifies the gap detection logic by providing a single source of truth./q
Snapshot Analysis Tool (analyze_snapshot.rs):
- A new standalone binary for analyzing snapshot directories.
- Provides statistics on total blocks, transactions, and compression-related transactions.
- Shows the distribution of transactions across different Merkle trees.
- Calculates the percentage of blocks that could be skipped if a specific tree filter were applied.

Summary by CodeRabbit

New Features
- Per-tree indexing filter, optional rewind-based gap detection, and a new snapshot-analysis CLI tool.
Improvements
- Broader signature tracking, selective block indexing, enhanced logging/diagnostics, and more robust batch/indexing flows with queue-aware processing.
Bug Fixes
- Replaced panics with controlled validation errors; rewinds trigger safer gap handling.
Tests
- New and expanded integration tests and utilities for snapshot creation, parsing, gap filling, and sequence-consistency checks.
Chores
- Dependency source updates and .gitignore now ignores .txt files.

Copilot

Pull Request Overview

This PR implements a comprehensive gap detection and rewind system for the Photon indexer to ensure sequence consistency across all state update types. The system automatically detects missing sequence numbers in V1/V2 trees, nullifications, and batch events, then triggers a rewind to re-fetch missing blocks.

Key changes include:

Gap detection system that tracks sequence numbers across all state update types
Rewind controller that repositions block streams when gaps are detected
Tree filtering capability to process only blocks relevant to specific Merkle trees
Unified sequence tracking for both State and Address trees

Reviewed Changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/ingester/detect_gaps.rs	Implements comprehensive gap detection across all StateUpdate field types
src/ingester/rewind_controller.rs	Creates rewind system to handle gap-triggered stream repositioning
src/ingester/fetchers/*.rs	Adds rewind command handling to gRPC and poller block fetchers
src/main.rs	Adds --tree CLI argument for filtering indexer to specific trees
tests/integration_tests/*.rs	Adds comprehensive gap detection validation and filling test utilities
analyze_snapshot.rs	New tool for analyzing snapshot files and tree distribution

Comments suppressed due to low confidence (1)

src/ingester/detect_gaps.rs:48

The field name _tree_type_string with leading underscore suggests it's unused, but it's part of a public struct. Either remove it if truly unused or rename without the underscore prefix.

    //  pub tree_type_string: Option<String>, // Tree type string (for indexed tree updates)

tests/integration_tests/zeroeth_element_fix_test.rs

tests/integration_tests/mock_tests.rs

src/ingester/detect_gaps.rs

Copilot · 2025-07-31T00:13:42Z

src/ingester/mod.rs

+        log::info!(
+            "amt sequence state {:?}",
+            SEQUENCE_STATE
+                .lock()
+                .unwrap()
+                .get("amt1Ayt45jfbdw5YSo7iz6WZxUmnZsQTYXy82hVwyC2")
+        );
+        log::info!(
+            "smt sequence state {:?}",
+            SEQUENCE_STATE
+                .lock()
+                .unwrap()
+                .get("smt1NamzXdq4AMqS2fS2F1i5KTYPZRhoHgWx38d8WsT")
+        );


These debug log statements with hardcoded pubkeys appear to be temporary debugging code that should be removed before production deployment.

Suggested change

log::info!(

"amt sequence state {:?}",

SEQUENCE_STATE

.lock()

.unwrap()

.get("amt1Ayt45jfbdw5YSo7iz6WZxUmnZsQTYXy82hVwyC2")

);

log::info!(

"smt sequence state {:?}",

SEQUENCE_STATE

.lock()

.unwrap()

.get("smt1NamzXdq4AMqS2fS2F1i5KTYPZRhoHgWx38d8WsT")

);

if cfg!(debug_assertions) {

log::debug!(

"amt sequence state {:?}",

SEQUENCE_STATE

.lock()

.unwrap()

.get(&get_amt_pubkey())

);

log::debug!(

"smt sequence state {:?}",

SEQUENCE_STATE

.lock()

.unwrap()

.get(&get_smt_pubkey())

);

}

src/api/method/get_multiple_compressed_accounts.rs

Copilot · 2025-07-31T00:13:43Z

src/ingester/parser/state_update.rs

@@ -76,7 +77,7 @@ pub struct AddressQueueUpdate {
 impl From<NewAddress> for AddressQueueUpdate {
    fn from(new_address: NewAddress) -> Self {
        AddressQueueUpdate {
-            tree: SerializablePubkey::from(new_address.mt_pubkey),
+            tree: SerializablePubkey::from(new_address.tree_pubkey),


The field access changed from mt_pubkey to tree_pubkey but there's no indication in the diff that the NewAddress struct was updated to match. This could cause compilation errors if the field doesn't exist.

coderabbitai · 2025-07-31T00:15:16Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds per-tree filtering, sequence gap detection, and a rewind subsystem; threads optional rewind receivers through block fetchers; records transaction signatures with tree updates; introduces snapshot analysis tooling; updates Cargo targets/deps; and expands integration tests and snapshot utilities.

Changes

Cohort / File(s)	Change Summary
Gap subsystem & types `src/ingester/gap/mod.rs`, `src/ingester/gap/rewind.rs`, `src/ingester/gap/sequences.rs`, `src/ingester/gap/treetype_seq.rs`	New gap/tracking modules: global `SEQUENCE_STATE`, `SequenceGap`/`SequenceEntry`, `StateUpdateFieldType`, `RewindCommand`/`RewindController`, `StateUpdateSequences` extraction/update/gap-detection, and `TreeTypeSeq` sequencing logic.
Ingester core & indexing `src/ingester/mod.rs`, `src/ingester/indexer/mod.rs`, `src/ingester/error.rs`	Wire optional `rewind_controller` and `tree_filter` into derive/index functions; add `IngesterError::CustomError` and `GapDetectedRewind`; detect gaps, request rewinds, and propagate rewind errors; update signatures to accept new optional params.
Fetchers & streams `src/ingester/fetchers/mod.rs`, `src/ingester/fetchers/grpc.rs`, `src/ingester/fetchers/poller.rs`	`BlockStreamConfig` gains `rewind_receiver`; `load_block_stream` now consumes it; poller/gRPC paths accept/propagate optional rewind receiver; poller listens for rewind commands, clears caches and restarts from rewind slot; gRPC fallback paths omit the receiver.
Parser & state update changes `src/ingester/parser/mod.rs`, `src/ingester/parser/merkle_tree_events_parser.rs`, `src/ingester/parser/state_update.rs`, `src/ingester/parser/tx_event_parser_v2.rs`	`parse_transaction` gains `tree_filter` param and early-exit filtering; transaction `signature` propagated into indexed leaf updates; batch/new-address/account handling refined to avoid duplicates and exclude AddressV1 queue_index entries.
Persistence updates `src/ingester/persist/persisted_indexed_merkle_tree.rs`, `src/ingester/persist/persisted_batch_event.rs`	Persist transaction signatures into `address_tree_histories`; add queue-based, validated handling for batch append/nullify/address-append events; adjust indexed-tree persistence flows and conflict handling.
CLI, main & tooling `src/main.rs`, `Cargo.toml`, `src/tools/analyze_snapshot.rs`	Add CLI args `tree` and `disable_gap_detection`; create/propagate `RewindController` and `tree_filter` into continuous indexing and `BlockStreamConfig`; add new binary `photon-analyze-snapshot`; update several deps to Git revs and add the new bin target.
Snapshot & snapshotter `src/snapshot/mod.rs`, `src/snapshot/snapshotter/main.rs`	`create_temp_snapshot_file` adds system-temp fallback; snapshotter initializes `BlockStreamConfig.rewind_receiver = None`; minor refactor using local `last_indexed_slot` variable.
API adjustments `src/api/method/get_multiple_compressed_accounts.rs`, `src/api/method/get_transaction_with_compression_info.rs`	Replace panic with `PhotonApiError::ValidationError` default in `get_multiple_compressed_accounts`; update `parse_transaction` calls to pass explicit third arg (`None`).
Tests — new & updated `tests/integration_tests/*`	Add snapshot utilities and extensive integration tests (snapshot creation/parsing, gap-filling, sequence-consistency); update tests and mocks to include new `signature` field and `parse_transaction` third argument; add new test modules.
Project config `.gitignore`	Add rule to ignore `*.txt` files.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI / main
    participant Stream as BlockStream (fetchers)
    participant Poller as Poller / RPC Poller
    participant Indexer as Ingester / Indexer
    participant Gap as RewindController

    CLI->>Stream: build BlockStreamConfig(rewind_receiver?)
    Stream->>Poller: start poller / grpc stream (with optional rewind_receiver)
    Poller->>Poller: receive blocks
    loop per-block-batch
        Poller->>Indexer: emit Vec<BlockInfo>
        Indexer->>Indexer: parse_transaction(..., tree_filter)
        Indexer->>Indexer: derive state updates & detect gaps
        alt gaps detected
            Indexer->>Gap: request_rewind_for_gaps(...)
            Gap->>Stream: send RewindCommand via channel
            Poller->>Poller: clear caches, set start slot, restart
        end
    end

sequenceDiagram
    autonumber
    participant Analyzer as analyze_snapshot tool
    participant FS as Snapshot Dir
    participant Parser as Ingester parser

    Analyzer->>FS: load snapshot blocks
    Analyzer->>Parser: parse transactions (optional tree_filter)
    Parser->>Analyzer: emit per-tree counts & signatures
    Analyzer->>Analyzer: aggregate and print summary

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

I hopped through slots with nose attuned,
Found missing beats where sequences crooned.
I tucked a signature in every leaf,
Pushed rewind buttons, eased the grief.
Snapshots sing — the garden’s tuned. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sergey/account-tx-safe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 15

♻️ Duplicate comments (7)

tests/integration_tests/zeroeth_element_fix_test.rs (2)
119-119: Consider using a more explicit test signature.

Using Default::default() creates an all-zero signature which may not be appropriate for test data clarity.

Consider using a more explicit test signature:
-        signature: Default::default(),
+        signature: [1u8; 64], // Mock signature for testing purposes
215-215: Same signature concern as above.

Same issue with using Default::default() for signature field - consider using a more explicit test signature for clarity.
src/api/method/get_multiple_compressed_accounts.rs (1)

151-155: Good error handling improvement, but unrelated to main PR.

Replacing the panic with proper error handling is definitely an improvement for API robustness. However, this change appears unrelated to the gap detection and rewind functionality that's the main focus of this PR.
tests/integration_tests/mock_tests.rs (1)
1643-1643: Consider removing the explicit assignment.

Following up on the previous review comment, the let _ = assignment can be removed since the return value of insert() doesn't need to be captured in this test context.
-            let _ = indexed_leaf_updates.insert(
+            indexed_leaf_updates.insert(
src/ingester/mod.rs (1)

216-229: Remove debug log statements with hardcoded pubkeys.

These debug statements appear to be temporary debugging code that should be removed before production deployment.
src/ingester/detect_gaps.rs (2)
107-110: Debug print statements should use the tracing crate.
-            println!(
-                "DEBUG: Updating batch_address_queue_indexes for tree: {}, sequence: {}",
-                tree_str, max_entry.sequence
-            );
+            tracing::debug!(
+                "Updating batch_address_queue_indexes for tree: {}, sequence: {}",
+                tree_str, max_entry.sequence
+            );
219-223: Error logging should use tracing::error!.
-                    println!("ERROR: AddressV1 tree {} found in batch_new_addresses - this should not happen!", tree_str);
-                    println!(
-                        "  queue_index: {}, slot: {}, signature: {}",
-                        address.queue_index, slot, signature
-                    );
+                    tracing::error!(
+                        tree = %tree_str,
+                        queue_index = address.queue_index,
+                        slot = slot,
+                        signature = %signature,
+                        "AddressV1 tree found in batch_new_addresses - this should not happen!"
+                    );

🧹 Nitpick comments (16)

Cargo.toml (1)
89-93: Consider the implications of Git dependencies.

The change from version specifications to Git dependencies for light-protocol libraries points to a specific commit. While this may be necessary for accessing unreleased features, it introduces build stability risks.

Consider:

Documentation: Add comments explaining why these specific Git dependencies are needed

Migration plan: Plan to migrate back to published versions once the required features are released

Version pinning: The commit hash provides good reproducibility, which is positive
# Required for [specific feature/fix] - migrate to published version when available
light-batched-merkle-tree = { version = "0.3.0", git = "https://github.com/lightprotocol/light-protocol", rev = "341aae4dfc89a27913e6ff1af65572d626b0cc19" }
src/ingester/parser/tree_info.rs (1)
15-24: Improve documentation for sequence types

The inline comments could be more descriptive about what these sequence values represent and their purpose in gap detection.

Consider expanding the comments to explain:

What each sequence entry represents

Why StateV2 has three different sequence types

The relationship between input queue, batch event, and output queue sequences
 #[derive(Debug, Clone)]
 pub enum TreeTypeSeq {
     StateV1(SequenceEntry),
-    // Output queue (leaf index), Input queue index, Batch event seq with context
+    // StateV2 tracks three sequence types for comprehensive gap detection:
+    // - Output queue: tracks leaf index sequences
+    // - Input queue: tracks input processing sequences  
+    // - Batch event: tracks batch processing sequences with additional context
     StateV2(StateV2SeqWithContext),
-    // event seq with complete context
+    // AddressV1: tracks address tree event sequences with complete context
     AddressV1(SequenceEntry),
-    // Input queue index, Batch event seq with context
+    // AddressV2: tracks both input queue and batch event sequences separately
     AddressV2(SequenceEntry, SequenceEntry), // (input_queue_entry, batch_event_entry)
 }
src/ingester/indexer/mod.rs (1)
115-124: Fragile error detection using string matching

The error handling relies on checking if the error message contains "Gap detection triggered rewind", which is fragile and could break if the error message changes.

Consider using a dedicated error variant for rewind errors instead of string matching:
-                if e.to_string().contains("Gap detection triggered rewind") {
+                if matches!(e, IngesterError::GapDetectionRewind { .. }) {
                     // Gap detected, rewind triggered - the slot stream should handle repositioning
                     log::info!("Gap detection triggered rewind");
                     continue;
This would require adding a new variant to the IngesterError enum for more type-safe error handling.
src/main.rs (1)
241-245: Duplicated tree filter parsing logic

The tree pubkey parsing logic is duplicated in two places. Consider extracting it to a helper function to follow DRY principles.
+fn parse_tree_filter(tree_str: Option<&String>) -> Option<Pubkey> {
+    tree_str.map(|s| {
+        s.parse::<Pubkey>()
+            .expect("Invalid tree pubkey format")
+    })
+}

 // Then replace both occurrences with:
-let tree_filter = args.tree.as_ref().map(|tree_str| {
-    tree_str
-        .parse::<Pubkey>()
-        .expect("Invalid tree pubkey format")
-});
+let tree_filter = parse_tree_filter(args.tree.as_ref());
Also applies to: 302-306
tests/integration_tests/snapshot_test_utils.rs (2)
16-197: Consider breaking down the large function into smaller, focused functions.

The create_test_snapshot_from_compression_transactions function is quite long (~180 lines) and handles multiple responsibilities. This makes it harder to test and maintain.

Consider extracting logical segments into separate functions:
async fn fetch_compression_slots(
    client: &RpcClient,
    target_slot: u64,
) -> Result<Vec<u64>> {
    let (signatures, signature_to_slot_map) = 
        fetch_compression_signatures_until_slot(client, target_slot).await?;
    // ... extract unique slots logic
}

async fn fetch_blocks_for_slots(
    client: &RpcClient,
    slots: &[u64],
) -> Result<Vec<BlockInfo>> {
    // ... block fetching logic
}

async fn create_snapshot_from_blocks(
    blocks: Vec<BlockInfo>,
    snapshot_dir_path: &str,
) -> Result<()> {
    // ... snapshot creation logic
}
94-101: Document the error handling strategy for failed block fetches.

The current implementation logs errors but continues processing when blocks fail to fetch or parse. This could result in incomplete snapshots.

Consider collecting errors and reporting them at the end, or making the error handling configurable:
let mut fetch_errors = Vec::new();
// In the loop:
Err(e) => {
    let error_msg = format!("Failed to fetch block at slot {}: {}", slot, e);
    eprintln!("{}", error_msg);
    fetch_errors.push(error_msg);
}

// After the loop:
if !fetch_errors.is_empty() && blocks.is_empty() {
    return Err(anyhow::anyhow!(
        "Failed to fetch any blocks. Errors: {:?}", 
        fetch_errors
    ));
}
tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1)
42-42: Fix inconsistent field naming for tree_type_string.

The field is declared as _tree_type_string with an underscore prefix (line 42) but the underscore typically indicates unused fields in Rust. Either remove the underscore or mark it with #[allow(dead_code)] if truly unused.
-    pub _tree_type_string: Option<String>, // Tree type string (for indexed tree updates)
+    pub tree_type_string: Option<String>, // Tree type string (for indexed tree updates)
And update the usage:
-                _tree_type_string: tree_type_string.clone(),
+                tree_type_string: tree_type_string.clone(),
Also applies to: 297-297
src/ingester/detect_gaps.rs (7)
12-15: Consider alternatives to global mutable state for better testability and maintainability.

The global SEQUENCE_STATE makes the code harder to test and reason about. Consider passing state through function parameters or using a state management pattern that doesn't rely on global mutability.

17-24: Replace magic numbers with named constants.

The function uses magic numbers (1, 2, 3, 0) without clear meaning. Consider using an enum or named constants for better readability.
+const BATCH_APPEND_TYPE_ID: u8 = 1;
+const BATCH_NULLIFY_TYPE_ID: u8 = 2;
+const BATCH_ADDRESS_APPEND_TYPE_ID: u8 = 3;
+const OTHER_EVENT_TYPE_ID: u8 = 0;
+
 fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {
     match event {
-        MerkleTreeEvent::BatchAppend(_) => 1,
-        MerkleTreeEvent::BatchNullify(_) => 2,
-        MerkleTreeEvent::BatchAddressAppend(_) => 3,
-        _ => 0, // Other event types we don't care about
+        MerkleTreeEvent::BatchAppend(_) => BATCH_APPEND_TYPE_ID,
+        MerkleTreeEvent::BatchNullify(_) => BATCH_NULLIFY_TYPE_ID,
+        MerkleTreeEvent::BatchAddressAppend(_) => BATCH_ADDRESS_APPEND_TYPE_ID,
+        _ => OTHER_EVENT_TYPE_ID,
     }
 }
48-48: Remove commented-out code.

Dead code should be removed rather than left as comments.
     // Tree/context metadata
     pub tree_pubkey: Option<Pubkey>, // Tree pubkey (unified for all tree operations)
-    //  pub tree_type_string: Option<String>, // Tree type string (for indexed tree updates)
     pub field_type: StateUpdateFieldType,
211-214: Use structured logging instead of println for debugging.
-            println!(
-                "DEBUG: Extracting batch_new_address for tree: {}, queue_index: {}",
-                tree_str, address.queue_index
-            );
+            tracing::debug!(
+                tree = %tree_str,
+                queue_index = address.queue_index,
+                "Extracting batch_new_address"
+            );
273-276: Remove unnecessary wrapper function.

The detect_gaps_from_sequences function is just a wrapper that adds no value. Consider removing it and using detect_all_sequence_gaps directly.
-/// Detects gaps from a single StateUpdateSequences struct
-pub fn detect_gaps_from_sequences(sequences: &StateUpdateSequences) -> Vec<SequenceGap> {
-    detect_all_sequence_gaps(sequences)
-}
285-288: Replace debug println with structured logging.
-        println!(
-            "DEBUG: Processing indexed_tree_seqs - tree: {}, tree_type_id: {}",
-            tree_pubkey, tree_type_id
-        );
+        tracing::debug!(
+            tree = %tree_pubkey,
+            tree_type_id = tree_type_id,
+            "Processing indexed_tree_seqs"
+        );
384-387: Replace all println debug statements with structured logging.

Multiple debug println statements should use the tracing crate for consistency.

Example for line 384-387:
-            println!(
-                "DEBUG: Using current sequence state for tree {}: {:?}",
-                tree_str, current_seq
-            );
+            tracing::debug!(
+                tree = %tree_str,
+                current_seq = ?current_seq,
+                "Using current sequence state for tree"
+            );
Also applies to: 403-407, 419-422, 427-429
tests/integration_tests/test_v1_address_tree_gap_filler.rs (2)
124-124: Make snapshot path configurable.

The hardcoded path should be configurable via environment variable or parameter for flexibility in different test environments.
-    let snapshot_path = "target/snapshot_local";
+    let snapshot_path = std::env::var("SNAPSHOT_PATH")
+        .unwrap_or_else(|_| "target/snapshot_local".to_string());
335-376: Simplify nested logic for checking compression activity.

The nested loops and conditions make this code hard to follow. Consider extracting helper functions.
fn has_compression_activity(state_update: &StateUpdate) -> bool {
    !state_update.indexed_merkle_tree_updates.is_empty()
        || !state_update.leaf_nullifications.is_empty()
        || !state_update.batch_nullify_context.is_empty()
        || !state_update.batch_new_addresses.is_empty()
        || !state_update.batch_merkle_tree_events.is_empty()
        || !state_update.out_accounts.is_empty()
}

fn extract_v1_address_updates(state_update: &StateUpdate) -> Vec<IndexedTreeLeafUpdate> {
    state_update.indexed_merkle_tree_updates
        .iter()
        .filter_map(|((tree_pubkey, _), leaf_update)| {
            if leaf_update.tree_type == TreeType::AddressV1 && *tree_pubkey == V1_ADDRESS_TREE {
                Some(leaf_update.clone())
            } else {
                None
            }
        })
        .collect()
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0a45ee and b4bc44b.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (31)

.gitignore (1 hunks)
Cargo.toml (2 hunks)
analyze_snapshot.rs (1 hunks)
src/api/method/get_multiple_compressed_accounts.rs (1 hunks)
src/api/method/get_transaction_with_compression_info.rs (2 hunks)
src/ingester/detect_gaps.rs (1 hunks)
src/ingester/error.rs (1 hunks)
src/ingester/fetchers/grpc.rs (7 hunks)
src/ingester/fetchers/mod.rs (4 hunks)
src/ingester/fetchers/poller.rs (2 hunks)
src/ingester/indexer/mod.rs (3 hunks)
src/ingester/mod.rs (3 hunks)
src/ingester/parser/merkle_tree_events_parser.rs (3 hunks)
src/ingester/parser/mod.rs (3 hunks)
src/ingester/parser/state_update.rs (2 hunks)
src/ingester/parser/tree_info.rs (2 hunks)
src/ingester/parser/tx_event_parser_v2.rs (2 hunks)
src/ingester/persist/persisted_indexed_merkle_tree.rs (4 hunks)
src/ingester/rewind_controller.rs (1 hunks)
src/main.rs (7 hunks)
src/snapshot/mod.rs (1 hunks)
src/snapshot/snapshotter/main.rs (1 hunks)
tests/integration_tests/e2e_tests.rs (1 hunks)
tests/integration_tests/main.rs (1 hunks)
tests/integration_tests/mock_tests.rs (1 hunks)
tests/integration_tests/snapshot_test_utils.rs (1 hunks)
tests/integration_tests/snapshot_tests.rs (2 hunks)
tests/integration_tests/test_v1_address_tree_gap_filler.rs (1 hunks)
tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1 hunks)
tests/integration_tests/utils.rs (2 hunks)
tests/integration_tests/zeroeth_element_fix_test.rs (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (5)

src/ingester/parser/state_update.rs (1)

src/common/typedefs/serializable_pubkey.rs (6)

from (87-89)

from (93-95)

from (99-101)

from (105-107)

from (111-113)

from (127-129)

tests/integration_tests/utils.rs (1)

src/ingester/parser/mod.rs (1)

parse_transaction (42-175)

src/ingester/rewind_controller.rs (2)

src/snapshot/mod.rs (1)

new (249-263)

src/ingester/parser/indexer_events.rs (1)

tree_pubkey (34-39)

src/ingester/parser/mod.rs (2)

src/ingester/fetchers/grpc.rs (1)

parse_transaction (306-384)

src/ingester/parser/state_update.rs (1)

new (106-108)

src/ingester/indexer/mod.rs (2)

src/common/mod.rs (1)

fetch_current_slot_with_infinite_retry (135-147)

src/ingester/mod.rs (1)

index_block_batch_with_infinite_retries (209-251)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (48)

.gitignore (1)

16-16: LGTM! Sensible addition for development artifacts.

Adding *.txt to gitignore is appropriate for excluding text files that may be generated during debugging, logging, or testing phases related to the new gap detection and analysis features.

src/snapshot/snapshotter/main.rs (1)

259-259: LGTM! Correct exclusion of rewind support for snapshotter.

The snapshotter handles static snapshot creation rather than continuous indexing, so explicitly setting rewind_receiver: None is appropriate. This maintains consistency with the new rewind architecture while correctly excluding functionality that isn't needed for snapshot generation.

src/api/method/get_transaction_with_compression_info.rs (2)

213-213: LGTM! Consistent with updated parse_transaction signature.

The addition of None as the third parameter aligns with the new tree_filter parameter in parse_transaction. Since API methods should parse all transactions regardless of tree filtering, passing None is correct.

364-364: LGTM! Consistent parameter addition.

Same as the v1 helper function - correctly passing None for the tree filter parameter.

src/ingester/error.rs (1)

17-18: LGTM! Useful addition for enhanced error handling.

The CustomError(String) variant provides flexibility for handling various error conditions from the new gap detection and rewind features without requiring specific error types for every scenario.

src/snapshot/mod.rs (1)

454-459: LGTM! Good refactoring for clarity.

Extracting last_indexed_slot into a local variable improves code readability and makes the function parameter more explicit.

tests/integration_tests/e2e_tests.rs (1)

715-715: LGTM! Correctly updated for new function signature.

The addition of None as the third parameter correctly adapts to the new parse_transaction signature that includes an optional tree filter parameter.

tests/integration_tests/main.rs (1)

11-14: Excellent test coverage expansion for new functionality.

The addition of these three test modules provides comprehensive testing for the gap detection and rewind features:

snapshot_test_utils: Utilities for snapshot validation

test_v1_address_tree_gap_filler: End-to-end gap detection and filling tests

test_v1_address_tree_sequence_consistency: Sequence consistency validation

This directly supports the main PR objectives and ensures reliability of the new features.

tests/integration_tests/mock_tests.rs (1)

1651-1651: LGTM: Test data updated to match struct changes.

The addition of the signature field with a default value correctly reflects the structural changes to IndexedTreeLeafUpdate in the production code.

src/ingester/parser/state_update.rs (2)

66-66: LGTM: Addition of signature field enhances traceability.

The new signature field provides necessary cryptographic signature information for each leaf update, supporting the gap detection and rewind control system introduced in this PR.

80-80: Confirm NewAddress Struct Has tree_pubkey Field

Ensure that the NewAddress type imported from the light_compressed_account crate defines a tree_pubkey field (replacing the former mt_pubkey), otherwise this change will break compilation.

Verify in the external crate’s NewAddress definition that:

pub struct NewAddress { …, tree_pubkey: Pubkey, … } exists.

tests/integration_tests/utils.rs (2)

446-446: LGTM: Function call updated for new signature.

The addition of None as the third argument correctly matches the updated parse_transaction signature that now accepts an optional tree filter parameter.

465-465: LGTM: Consistent function call update.

This change maintains consistency with the updated parse_transaction signature across the test utility functions.

Cargo.toml (1)

36-38: LGTM: New snapshot analysis binary added.

The addition of the photon-analyze-snapshot binary target correctly supports the new snapshot analysis functionality described in the PR objectives.

src/ingester/fetchers/mod.rs (3)

21-21: LGTM: Rewind receiver field enables gap detection support.

The optional rewind_receiver field correctly integrates rewind command functionality into the block stream configuration, supporting the gap detection and rewind mechanism described in the PR objectives.

25-25: LGTM: Method signature change supports ownership transfer.

Changing from &self to mut self is necessary to enable take() operations on the rewind receiver, allowing proper ownership transfer to the stream creation functions.

34-34: LGTM: Proper ownership transfer of rewind receiver.

The use of self.rewind_receiver.take() correctly transfers ownership of the receiver to the active stream (either gRPC or poller). The second take() call will safely return None since only one stream is used at a time.

Also applies to: 43-43

src/ingester/parser/merkle_tree_events_parser.rs (9)

38-38: LGTM: Transaction signature correctly propagated to parsing function

The change correctly passes the transaction signature to parse_indexed_merkle_tree_update, enabling signature-aware state updates for gap detection.

112-138: LGTM: Signature parameter integration is well-implemented

The function signature update and subsequent usage in the IndexedTreeLeafUpdate struct (line 138) correctly associates each leaf update with its originating transaction signature. This enables precise gap detection and rewind functionality as described in the PR objectives.

134-153: LGTM: Intelligent filtering prevents duplicate account transactions

The logic correctly filters out batch input accounts that are being created in the same transaction by checking against output_account_hashes. This prevents duplicate AccountTransaction entries and improves data consistency.

155-166: LGTM: Address filtering aligns with unified sequence tracking

Filtering out addresses with queue_index == u64::MAX (AddressV1 trees) and using tree_pubkey correctly implements the unified sequence tracking approach mentioned in the PR objectives for gap detection.

121-154: LGTM: Comprehensive integration test for snapshot functionality

This test provides excellent end-to-end coverage of the snapshot creation and parsing pipeline using real compression transaction data. The use of #[ignore] is appropriate for tests requiring external API access, and the API key requirement is clearly documented.

48-59: LGTM: Rewind receiver properly integrated into main polling stream

The rewind receiver parameter is correctly added and passed through to the block poller stream, enabling dynamic rewind functionality as part of the gap detection system.

122-153: LGTM: Fallback scenarios correctly disable rewind

Passing None for the rewind receiver in fallback scenarios (timeout, out-of-order blocks, unhealthy gRPC) is the correct approach. During recovery situations, disabling rewind prevents additional complexity and ensures stable operation.

84-84: LGTM: Appropriate placeholder signatures for synthetic elements

Using Signature::from([0; 64]) as placeholder signatures for synthetic zeroeth and top elements is correct since these elements don't originate from actual transactions.

Also applies to: 129-129

238-265: LGTM: Unified gap detection through state_tree_histories integration

The addition of address tree entries to the state_tree_histories table enables unified sequence tracking across all tree types, which is essential for the comprehensive gap detection system. The "on conflict do nothing" strategy appropriately handles duplicate entries.

src/ingester/indexer/mod.rs (1)

77-113: Well-structured integration of rewind controller and tree filter

The implementation correctly passes the new parameters through the indexing pipeline and provides appropriate logging for filtered indexing operations.
src/ingester/fetchers/poller.rs (1)

43-109: Well-implemented rewind mechanism for block polling

The rewind implementation correctly:

Checks for rewind commands without blocking

Clears the block cache to prevent stale data

Resets slot positions appropriately

Uses proper control flow to restart the stream

One minor suggestion for clarity:
-                                log::error!("Rewinding block stream to {}: {}", to_slot, reason);
+                                log::info!("Rewinding block stream to slot {}: {}", to_slot, reason);
Since rewinding is an expected operation when gaps are detected, consider using log::info instead of log::error.
src/main.rs (1)

103-106: Clear documentation for tree filtering feature

The CLI argument is well-documented with clear explanation of its purpose.
src/ingester/parser/mod.rs (2)

47-70: Efficient early return for non-matching transactions

Good optimization to check tree involvement early and return an empty state update for irrelevant transactions.

Minor suggestion to simplify the nested loop logic:
-        let mut involves_tree = false;
-        for instruction_group in &tx.instruction_groups {
-            if instruction_group.outer_instruction.accounts.contains(tree) {
-                involves_tree = true;
-                break;
-            }
-            for inner_instruction in &instruction_group.inner_instructions {
-                if inner_instruction.accounts.contains(tree) {
-                    involves_tree = true;
-                    break;
-                }
-            }
-            if involves_tree {
-                break;
-            }
-        }
-
-        if !involves_tree {
+        let involves_tree = tx.instruction_groups.iter().any(|group| {
+            group.outer_instruction.accounts.contains(tree) ||
+            group.inner_instructions.iter().any(|ix| ix.accounts.contains(tree))
+        });
+
+        if !involves_tree {
183-221: Comprehensive state update filtering implementation

The filter_state_update_by_tree function thoroughly filters all relevant fields and correctly clears transactions when no tree-specific data remains.
src/ingester/rewind_controller.rs (1)

55-63: Consider the implications of returning 0 as fallback in determine_rewind_slot_from_gaps.

When no valid slots are found (all are zero or the list is empty after filtering), the function returns 0. This could potentially trigger a rewind to the genesis block, which might not be the intended behavior.

Consider returning an Option<u64> instead:
-fn determine_rewind_slot_from_gaps(gaps: &[SequenceGap]) -> u64 {
+fn determine_rewind_slot_from_gaps(gaps: &[SequenceGap]) -> Option<u64> {
     gaps.iter()
         .map(|gap| gap.before_slot)
         .filter(|&slot| slot > 0)
         .min()
-        .unwrap_or(0)
 }
Then handle the None case appropriately in request_rewind_for_gaps.
tests/integration_tests/test_v1_address_tree_gap_filler.rs (4)

24-119: Well-structured integration test with comprehensive gap filling logic.

The test follows a clear phase-based approach with proper error handling and fallback mechanisms. Good use of helper functions to keep the main test readable.

241-306: Well-implemented pagination logic for signature fetching.

Good handling of RPC pagination with proper termination conditions and reasonable batch size.

419-487: Good implementation of fallback gap filling mechanism.

The function properly handles edge cases where the signature-based approach might miss blocks, with clear logging and progress tracking.

724-762: Well-implemented verification with comprehensive reporting.

Good approach to group remaining gaps by field type and provide clear feedback about the gap filling results.

src/ingester/parser/tx_event_parser_v2.rs (3)

8-8: LGTM! Necessary imports for enhanced functionality.

The new imports support the filtering logic for batch input accounts and address queue updates.

Also applies to: 11-12

134-153: Excellent filtering logic to prevent duplicate account transactions.

The implementation correctly:

Uses a HashSet for efficient O(1) lookups when filtering

Prevents creating AccountTransaction entries for accounts being created in the same transaction

Maintains proper type conversions and error handling

This optimization avoids unnecessary database entries and processing overhead.

155-166: LGTM! Proper filtering and field mapping for address queue updates.

The logic correctly:

Excludes AddressV1 trees using the u64::MAX sentinel value

Updates field mapping from mt_pubkey to tree_pubkey as mentioned in the PR objectives

Maintains consistent filtering pattern with the batch input accounts logic

The comment clearly explains the business logic for the exclusion.

tests/integration_tests/snapshot_tests.rs (2)

10-12: LGTM! Clean imports for test utilities.

The new imports support the compression snapshot testing functionality.

121-154: Well-structured integration test with good practices.

The test implementation correctly:

Uses #[ignore] for network-dependent test

Provides clear error message for missing API key

Includes proper assertions and informative output

Follows async test patterns

Minor considerations for future maintenance:

The hardcoded slot (391843372) may become stale over time

Consider cleanup of test snapshot directory

Test depends on external API availability

These are typical trade-offs for integration tests using real network data.

src/ingester/fetchers/grpc.rs (4)

18-18: LGTM! Necessary imports for rewind functionality.

The new imports support the rewind command receiver integration.

Also applies to: 34-34

48-48: Good design choice for rewind receiver parameter.

The optional parameter maintains backward compatibility while enabling rewind functionality when needed.

59-59: Correct integration of rewind receiver in main stream.

The rewind receiver is properly passed to the primary RPC polling stream, enabling rewind functionality.

122-122: Appropriate design to disable rewind in fallback scenarios.

Explicitly passing None for rewind receiver in fallback streams is the right approach because:

Fallback streams are temporary and already handling error conditions

Keeps fallback logic simple and focused

Main stream handles rewind functionality appropriately

The implementation is consistent across all fallback scenarios.

Also applies to: 140-140, 153-153

src/ingester/persist/persisted_indexed_merkle_tree.rs (3)

11-11: LGTM! Necessary import for signature support.

The Signature import supports the new signature field in IndexedTreeLeafUpdate.

84-84: Appropriate use of placeholder signatures for synthetic elements.

Using Signature::from([0; 64]) with clear comments is the right approach for synthetic zeroeth and top elements that don't originate from actual transactions.

Also applies to: 129-129

238-265: Excellent implementation of unified gap detection persistence.

The state_tree_histories persistence logic correctly:

Maps all required fields from IndexedTreeLeafUpdate to the database model

Uses appropriate conflict resolution (do_nothing) to prevent overwrites

Keys conflict detection on (tree, seq) which is perfect for gap detection

Includes proper error handling and empty batch guards

Converts signature to byte vector format correctly

This implementation effectively enables unified sequence tracking across state and address trees as described in the PR objectives.

analyze_snapshot.rs

src/ingester/detect_gaps.rs

tests/integration_tests/test_v1_address_tree_gap_filler.rs

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

src/ingester/detect_gaps.rs (4)

212-215: Replace debug print with tracing

Debug print statements should use the tracing crate for consistency.

-            println!(
-                "DEBUG: Extracting batch_new_address for tree: {}, queue_index: {}",
-                tree_str, address.queue_index
-            );
+            tracing::debug!(
+                "Extracting batch_new_address for tree: {}, queue_index: {}",
+                tree_str, address.queue_index
+            );

286-289: Replace all debug prints with tracing

Multiple debug print statements should use the tracing crate.

-        println!(
-            "DEBUG: Processing indexed_tree_seqs - tree: {}, tree_type_id: {}",
-            tree_pubkey, tree_type_id
-        );
+        tracing::debug!(
+            "Processing indexed_tree_seqs - tree: {}, tree_type_id: {}",
+            tree_pubkey, tree_type_id
+        );

And at lines 404-408:

-        println!("field_type: {:?}", field_type);
-        println!(
-            "tree_pubkey: {:?}, queue_pubkey: {:?}",
-            tree_pubkey, queue_pubkey
-        );
+        tracing::debug!("field_type: {:?}", field_type);
+        tracing::debug!(
+            "tree_pubkey: {:?}, queue_pubkey: {:?}",
+            tree_pubkey, queue_pubkey
+        );

Also applies to: 404-408

361-573: Refactor this large function for better maintainability

This 200+ line function should be broken down into smaller, focused functions as suggested in the past review.

Consider extracting:

State lookup logic (lines 379-414)
Sequence unpacking logic (lines 416-532)
Gap detection logic (lines 534-572)

This will improve readability and make the code easier to test and maintain.

416-532: Extract repetitive sequence unpacking logic

The match statement contains highly repetitive patterns that should be refactored.

Create helper functions to reduce duplication:

fn unpack_sequence_entry(
    tree_type_seq: &TreeTypeSeq,
    field_type: &StateUpdateFieldType,
) -> (u64, Option<SequenceEntry>) {
    match (tree_type_seq, field_type) {
        (TreeTypeSeq::AddressV1(entry), StateUpdateFieldType::IndexedTreeUpdate) => {
            tracing::debug!("IndexedTreeUpdate with AddressV1, seq: {}", entry.sequence);
            (entry.sequence, Some(entry.clone()))
        }
        (TreeTypeSeq::AddressV2(_, entry), StateUpdateFieldType::BatchMerkleTreeEventAddressAppend) |
        (TreeTypeSeq::AddressV2(_, entry), StateUpdateFieldType::BatchNewAddress) => {
            (entry.sequence, Some(entry.clone()))
        }
        // ... other cases
        _ => {
            tracing::warn!("Sequence type mismatch for field type: {:?}", field_type);
            (u64::MAX, None)
        }
    }
}

🧹 Nitpick comments (2)

src/ingester/detect_gaps.rs (2)
48-48: Remove commented code

This commented field appears to be leftover from refactoring and should be removed.
    pub tree_pubkey: Option<Pubkey>, // Tree pubkey (unified for all tree operations)
-    //  pub tree_type_string: Option<String>, // Tree type string (for indexed tree updates)
    pub field_type: StateUpdateFieldType,
293-293: Address TODO comment

The TODO indicates missing functionality for queue pubkey handling.

Would you like me to help implement the queue pubkey logic or create an issue to track this?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4bc44b and 5ffa8f0.

📒 Files selected for processing (4)

src/ingester/detect_gaps.rs (1 hunks)
src/ingester/mod.rs (3 hunks)
tests/integration_tests/mock_tests.rs (1 hunks)
tests/integration_tests/zeroeth_element_fix_test.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tests/integration_tests/mock_tests.rs
tests/integration_tests/zeroeth_element_fix_test.rs
src/ingester/mod.rs

🔇 Additional comments (3)

src/ingester/detect_gaps.rs (3)

1-15: LGTM!

The imports and global state declaration are appropriate for the gap detection functionality.

17-36: LGTM!

The helper function and enum definitions are well-structured and serve their purpose effectively.

70-162: Well-structured state update logic

The function properly handles different tree types and correctly uses tracing::debug for logging.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1)
11-18: Replace magic numbers with named constants.

The function still uses hardcoded numbers (1, 2, 3) to represent event types, which reduces code readability as noted in previous reviews.

Define constants for better clarity:
+const EVENT_TYPE_BATCH_APPEND: u8 = 1;
+const EVENT_TYPE_BATCH_NULLIFY: u8 = 2;
+const EVENT_TYPE_BATCH_ADDRESS_APPEND: u8 = 3;
+
 fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {
     match event {
-        MerkleTreeEvent::BatchAppend(_) => 1,
-        MerkleTreeEvent::BatchNullify(_) => 2,
-        MerkleTreeEvent::BatchAddressAppend(_) => 3,
+        MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_BATCH_APPEND,
+        MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_BATCH_NULLIFY,
+        MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_BATCH_ADDRESS_APPEND,
         _ => 0,
     }
 }
src/ingester/detect_gaps.rs (1)
344-551: Function is still too long and complex.

The detect_sequence_gaps_with_metadata function remains overly long (200+ lines) with complex nested logic, as noted in previous reviews. The repetitive sequence unpacking logic (lines 394-510) also persists.

Consider breaking this into smaller functions:

get_current_sequence_state() - handle state lookup logic

unpack_sequence_by_field_type() - handle sequence unpacking

detect_gaps_in_sorted_sequences() - core gap detection logic
fn detect_sequence_gaps_with_metadata(
    sequences: &[SequenceEntry],
    tree_pubkey: Option<Pubkey>,
    queue_pubkey: Option<Pubkey>,
    field_type: StateUpdateFieldType,
) -> Vec<SequenceGap> {
    if sequences.len() < 2 {
        return Vec::new();
    }

    let mut sorted_sequences = sequences.to_vec();
    sorted_sequences.sort_by_key(|entry| entry.sequence);
    
+   let start_seq = get_current_sequence_state(tree_pubkey, queue_pubkey);
+   let (unpacked_start_seq, start_entry) = unpack_sequence_by_field_type(&start_seq, &field_type, tree_pubkey, queue_pubkey);
+   
+   detect_gaps_in_sorted_sequences(&sorted_sequences, unpacked_start_seq, start_entry, tree_pubkey, field_type)
-   // ... rest of the complex logic
}

🧹 Nitpick comments (3)

src/tools/analyze_snapshot.rs (2)
55-55: Consider adding tree filtering to parse_transaction call.

The parse_transaction call passes None for the tree filter parameter, but since this tool supports tree filtering analysis, consider passing the target_tree_pubkey to potentially improve performance when analyzing specific trees.
-            match parse_transaction(tx, block.metadata.slot, None) {
+            match parse_transaction(tx, block.metadata.slot, target_tree_pubkey.as_ref()) {
136-140: Consider making tree display limit configurable.

The hardcoded limit of 10 trees in the output could be made configurable via a command-line argument for better flexibility in analysis.
-        if i < 10 || target_tree.as_ref().map(|t| t == tree).unwrap_or(false) {
+        const MAX_TREES_TO_DISPLAY: usize = 10;
+        if i < MAX_TREES_TO_DISPLAY || target_tree.as_ref().map(|t| t == tree).unwrap_or(false) {
tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1)
340-350: Consider more specific error handling.

The transaction parsing silently continues on errors. While this is appropriate for compression parsing issues, consider logging the error types or counts to help diagnose parsing problems.
                match parse_transaction(transaction, slot, None) {
                    Ok(state_update) => {
                        parsed_transactions += 1;
                        sequences.extract_state_update_sequences(&state_update, slot, &signature);
                    }
-                   Err(_) => {
-                       // Skip failed parsing - compression transactions might have parsing issues
-                       continue;
-                   }
+                   Err(e) => {
+                       // Skip failed parsing - compression transactions might have parsing issues
+                       tracing::debug!("Failed to parse transaction {}: {}", signature, e);
+                       continue;
+                   }
                }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ffa8f0 and d6685b3.

📒 Files selected for processing (7)

Cargo.toml (2 hunks)
src/ingester/detect_gaps.rs (1 hunks)
src/ingester/mod.rs (3 hunks)
src/ingester/parser/tree_info.rs (2 hunks)
src/tools/analyze_snapshot.rs (1 hunks)
tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1 hunks)
tests/integration_tests/zeroeth_element_fix_test.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

tests/integration_tests/zeroeth_element_fix_test.rs
Cargo.toml
src/ingester/parser/tree_info.rs
src/ingester/mod.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (5)

src/tools/analyze_snapshot.rs (2)

7-16: LGTM! Clean CLI argument validation.

The argument parsing and validation logic is well-structured with clear usage instructions and proper exit handling.

42-46: Good error handling for pubkey parsing.

The optional target tree pubkey parsing correctly uses transpose() to handle the Option<Result<T, E>> pattern and provides a clear error message.

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1)

312-313: Good improvement on hardcoded path issue.

The previous hardcoded local path has been properly replaced with an environment variable approach, which resolves the portability concern from earlier reviews.

src/ingester/detect_gaps.rs (2)

101-105: Good use of tracing for debug logging.

The debug logging has been properly updated to use tracing::debug! instead of println!, addressing the previous review feedback.

207-214: Good use of tracing for error logging.

The error logging correctly uses tracing::error! instead of println!, which addresses the previous feedback about proper log level management.

src/ingester/detect_gaps.rs

coderabbitai · 2025-07-31T07:17:49Z

src/ingester/detect_gaps.rs

+fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {
+    match event {
+        MerkleTreeEvent::BatchAppend(_) => 1,
+        MerkleTreeEvent::BatchNullify(_) => 2,
+        MerkleTreeEvent::BatchAddressAppend(_) => 3,
+        _ => 0, // Other event types we don't care about
+    }
+}


🛠️ Refactor suggestion

Replace magic numbers with named constants.

Same issue as in the test file - hardcoded numbers reduce code readability and maintainability.

+const EVENT_TYPE_BATCH_APPEND: u8 = 1; +const EVENT_TYPE_BATCH_NULLIFY: u8 = 2; +const EVENT_TYPE_BATCH_ADDRESS_APPEND: u8 = 3; + fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 { match event { - MerkleTreeEvent::BatchAppend(_) => 1, - MerkleTreeEvent::BatchNullify(_) => 2, - MerkleTreeEvent::BatchAddressAppend(_) => 3, + MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_BATCH_APPEND, + MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_BATCH_NULLIFY, + MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_BATCH_ADDRESS_APPEND, _ => 0, } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {

match event {

MerkleTreeEvent::BatchAppend(_) => 1,

MerkleTreeEvent::BatchNullify(_) => 2,

MerkleTreeEvent::BatchAddressAppend(_) => 3,

_ => 0, // Other event types we don't care about

}

}

// Add named constants for event type IDs

const EVENT_TYPE_BATCH_APPEND: u8 = 1;

const EVENT_TYPE_BATCH_NULLIFY: u8 = 2;

const EVENT_TYPE_BATCH_ADDRESS_APPEND: u8 = 3;

fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {

match event {

MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_BATCH_APPEND,

MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_BATCH_NULLIFY,

MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_BATCH_ADDRESS_APPEND,

_ => 0, // Other event types we don't care about

}

}

🤖 Prompt for AI Agents

In src/ingester/detect_gaps.rs around lines 18 to 25, the function uses hardcoded numeric literals to represent event type IDs, which reduces readability. Define named constants for each event type ID (e.g., BATCH_APPEND_ID = 1) at the top of the file or in a constants module, then replace the numeric literals in the match arms with these constants to improve clarity and maintainability.

coderabbitai · 2025-07-31T07:17:49Z

src/ingester/detect_gaps.rs

+        let field_type = match event_type {
+            1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,
+            2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,
+            3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
+            _ => continue,
+        };


🛠️ Refactor suggestion

Same magic numbers used again.

The magic numbers (1, 2, 3) are repeated here, creating the same maintainability issue as noted in previous comments.

let field_type = match event_type { - 1 => StateUpdateFieldType::BatchMerkleTreeEventAppend, - 2 => StateUpdateFieldType::BatchMerkleTreeEventNullify, - 3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend, + EVENT_TYPE_BATCH_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend, + EVENT_TYPE_BATCH_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify, + EVENT_TYPE_BATCH_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend, _ => continue, };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let field_type = match event_type {

1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,

2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,

3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,

_ => continue,

};

let field_type = match event_type {

EVENT_TYPE_BATCH_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend,

EVENT_TYPE_BATCH_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify,

EVENT_TYPE_BATCH_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,

_ => continue,

};

🤖 Prompt for AI Agents

In src/ingester/detect_gaps.rs around lines 318 to 323, replace the magic numbers 1, 2, and 3 used in the match statement with named constants or an enum variant that clearly represent these event types. Define these constants or enum variants elsewhere in the codebase if not already present, and use them here to improve code readability and maintainability.

coderabbitai · 2025-07-31T07:17:49Z

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs

+        let field_type = match event_type {
+            1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,
+            2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,
+            3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
+            _ => continue,
+        };


🛠️ Refactor suggestion

Inconsistent magic number usage.

The same magic numbers (1, 2, 3) are used again in the gap detection logic, creating maintenance burden if these values ever change.

let field_type = match event_type { - 1 => StateUpdateFieldType::BatchMerkleTreeEventAppend, - 2 => StateUpdateFieldType::BatchMerkleTreeEventNullify, - 3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend, + EVENT_TYPE_BATCH_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend, + EVENT_TYPE_BATCH_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify, + EVENT_TYPE_BATCH_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend, _ => continue, };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let field_type = match event_type {

1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,

2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,

3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,

_ => continue,

};

let field_type = match event_type {

EVENT_TYPE_BATCH_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend,

EVENT_TYPE_BATCH_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify,

EVENT_TYPE_BATCH_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,

_ => continue,

};

🤖 Prompt for AI Agents

In tests/integration_tests/test_v1_address_tree_sequence_consistency.rs around lines 245 to 250, the magic numbers 1, 2, and 3 are used directly in the match statement for event_type. To fix this, define constants or an enum representing these event types and replace the magic numbers with these named constants throughout the file, including the gap detection logic, to improve maintainability and reduce duplication.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/ingester/mod.rs (1)
199-227: Good error handling for rewind scenarios, but consider improving error type detection.

The logic correctly distinguishes between rewind-triggered errors (which should propagate) and other errors (which should be retried). However, the string matching approach for detecting rewind errors is somewhat fragile.

Consider using a more robust error type checking approach:
-                // Check if this is a gap-triggered rewind error
-                if e.to_string().contains("Gap detection triggered rewind") {
-                    // Don't retry, propagate the rewind error up
-                    return Err(e);
-                }
+                // Check if this is a gap-triggered rewind error
+                if matches!(e, IngesterError::RewindTriggered(_)) {
+                    // Don't retry, propagate the rewind error up
+                    return Err(e);
+                }
This would require adding a RewindTriggered variant to the IngesterError enum for more type-safe error handling.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6685b3 and cff013b.

📒 Files selected for processing (1)

src/ingester/mod.rs (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (3)

src/ingester/mod.rs (3)

11-11: LGTM! Clean import organization and module additions.

The consolidation of ConnectionTrait and QueryTrait imports improves readability, and the new detect_gaps and rewind_controller modules align perfectly with the PR's gap detection and rewind functionality objectives.

Also applies to: 23-23, 25-25, 31-31

34-89: Excellent integration of gap detection and rewind logic.

The implementation correctly:

Tracks sequences with proper slot and signature context during transaction parsing

Detects gaps using the new detect_all_sequence_gaps function

Handles rewind scenarios by returning early after requesting rewind

Updates the global sequence state after gap processing

Maintains clean separation of concerns with optional parameters

The error handling for rewind-triggered scenarios is particularly well done, ensuring the processing pipeline halts appropriately when gaps are detected.

148-197: Well-implemented block filtering and batch processing.

The implementation effectively:

Pre-filters blocks based on tree criteria, avoiding unnecessary processing

Handles empty filtered batches efficiently with early return

Maintains accurate metrics for both indexed and skipped blocks

Provides clear logging for operational visibility

Properly passes the new parameters through the processing pipeline

The separation between original batch size and filtered block count in metrics and logging is particularly well done.

coderabbitai · 2025-07-31T07:21:11Z

src/ingester/mod.rs

+    derive_block_state_update(block, None, None)?;
+    persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?;


🛠️ Refactor suggestion

Eliminate duplicate function call for better performance.

The derive_block_state_update function is called twice with identical parameters, but the first result is discarded. This is inefficient since the function performs transaction parsing, which can be expensive.

- derive_block_state_update(block, None, None)?; - persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?; + let state_update = derive_block_state_update(block, None, None)?; + persist_state_update(&txn, state_update).await?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

derive_block_state_update(block, None, None)?;

persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?;

let state_update = derive_block_state_update(block, None, None)?;

persist_state_update(&txn, state_update).await?;

🤖 Prompt for AI Agents

In src/ingester/mod.rs around lines 94 to 95, the function derive_block_state_update is called twice with the same parameters, causing unnecessary repeated computation. To fix this, call derive_block_state_update once, store its result in a variable, and then use that variable in both places where the result is needed. This eliminates the duplicate call and improves performance.

coderabbitai · 2025-07-31T07:21:11Z

src/ingester/mod.rs

+fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool {
+    for tx in &block.transactions {
+        for instruction_group in &tx.instruction_groups {
+            if instruction_group
+                .outer_instruction
+                .accounts
+                .contains(tree_filter)
+            {
+                return true;
+            }
+        }
+    }
+    false
+}


💡 Verification agent

🧩 Analysis chain

Verify that checking only outer instructions is sufficient for tree filtering.

The function only examines accounts in outer_instruction but not in any potential inner instructions. Please confirm this approach captures all relevant tree-related transactions.

🏁 Script executed:

#!/bin/bash # Description: Check if there are any inner instruction patterns that might contain tree references # Expected: Find examples of instruction structures to validate the filtering approach ast-grep --pattern $'struct InstructionGroup { $$$ inner_instructions: $$$ $$$ }' # Also check for any existing tree filtering logic in the codebase rg -A 5 -B 5 "inner_instruction" --type rust

Length of output: 9396

Include inner instructions in block_contains_tree

The block_contains_tree helper currently only checks outer_instruction.accounts but ignores inner_instructions, causing it to miss tree references in nested instructions. The parser and other logic (e.g., is_compression_transaction) explicitly iterate inner instructions when filtering by tree pubkey.

Please update src/ingester/mod.rs (lines 133–146) to also scan instruction_group.inner_instructions, for example:

fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool { for tx in &block.transactions { for instruction_group in &tx.instruction_groups { if instruction_group - .outer_instruction - .accounts - .contains(tree_filter) + .outer_instruction.accounts + .contains(tree_filter) { return true; } + // Check inner instructions as well + for inner in &instruction_group.inner_instructions { + if inner.accounts.contains(tree_filter) { + return true; + } + } } } false }

This ensures consistency with existing parsing logic and catches all tree-related instructions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool {

for tx in &block.transactions {

for instruction_group in &tx.instruction_groups {

if instruction_group

.outer_instruction

.accounts

.contains(tree_filter)

{

return true;

}

}

}

false

}

fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool {

for tx in &block.transactions {

for instruction_group in &tx.instruction_groups {

if instruction_group

.outer_instruction.accounts

.contains(tree_filter)

{

return true;

}

// Check inner instructions as well

for inner in &instruction_group.inner_instructions {

if inner.accounts.contains(tree_filter) {

return true;

}

}

}

}

false

}

🤖 Prompt for AI Agents

In src/ingester/mod.rs around lines 133 to 146, the function block_contains_tree only checks the accounts of outer_instruction but misses checking inner_instructions within each instruction_group. To fix this, update the function to also iterate over instruction_group.inner_instructions and check if any of their accounts contain the tree_filter pubkey, returning true if found. This will ensure all nested instructions are scanned consistently with other parsing logic.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

src/ingester/mod.rs (2)
90-91: Eliminate duplicate function call for better performance.

The derive_block_state_update function is called twice with identical parameters, but the first result is discarded. This is inefficient since the function performs transaction parsing, which can be expensive.
-    derive_block_state_update(block, None, None)?;
-    persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?;
+    let state_update = derive_block_state_update(block, None, None)?;
+    persist_state_update(&txn, state_update).await?;
129-142: Include inner instructions in block_contains_tree

The block_contains_tree helper currently only checks outer_instruction.accounts but ignores inner_instructions, causing it to miss tree references in nested instructions. The parser and other logic (e.g., is_compression_transaction) explicitly iterate inner instructions when filtering by tree pubkey.
 fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool {
     for tx in &block.transactions {
         for instruction_group in &tx.instruction_groups {
             if instruction_group
                 .outer_instruction
                 .accounts
                 .contains(tree_filter)
             {
                 return true;
             }
+            // Check inner instructions as well
+            for inner in &instruction_group.inner_instructions {
+                if inner.accounts.contains(tree_filter) {
+                    return true;
+                }
+            }
         }
     }
     false
 }

🧹 Nitpick comments (5)

src/ingester/gap/rewind.rs (2)
24-27: Consider using a bounded channel for production resilience.

While unbounded channels are convenient, they can lead to memory exhaustion if rewind commands accumulate faster than they're consumed. Consider using mpsc::channel(capacity) with a reasonable buffer size to apply backpressure when necessary.
 pub fn new() -> (Self, mpsc::UnboundedReceiver<RewindCommand>) {
-    let (sender, receiver) = mpsc::unbounded_channel();
+    let (sender, receiver) = mpsc::channel(100); // Adjust capacity based on expected load
     (Self { sender }, receiver)
 }
Note: This would require changing the receiver type and handling SendError differently in request_rewind.

65-127: Consider adding edge case tests.

The existing tests cover the main functionality well. Consider adding tests for:

Empty gaps array handling

All gaps having zero before_slot

Error propagation when channel is closed

Would you like me to generate additional test cases for these edge scenarios?
src/ingester/fetchers/poller.rs (1)
71-73: Use appropriate log level for operational events.

Rewinding is an expected operational behavior for gap handling, not an error condition. Consider using log::info! or log::warn! instead of log::error!.
-                            log::error!("Rewinding block stream to {}: {}", to_slot, reason);
+                            log::warn!("Rewinding block stream to {}: {}", to_slot, reason);
src/ingester/gap/treetype_seq.rs (1)
69-82: Consider using named constants for special sequence values.

While the documentation clearly explains the meaning of u64::MAX and 0, using named constants would improve code readability and maintainability.

Add these constants at the module level:
+/// Represents an invalid state due to tree type mismatch or unexpected configuration
+const SEQUENCE_INVALID: u64 = u64::MAX;
+
+/// Represents a valid but uninitialized sequence state
+const SEQUENCE_UNINITIALIZED: u64 = 0;

 impl TreeTypeSeq {
     // ... existing methods ...
Then use them in the method:
-                    (u64::MAX, None)
+                    (SEQUENCE_INVALID, None)

-                        (0, None)
+                        (SEQUENCE_UNINITIALIZED, None)
src/ingester/gap/mod.rs (1)

17-20: Consider dependency injection for better testability.

While the global state with lazy_static and RwLock works, it makes unit testing more difficult and creates hidden dependencies. Consider refactoring to pass the sequence state as a parameter through the call chain.

If refactoring is not feasible now, at least consider adding a test helper to reset the global state between tests to prevent test interdependencies.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c169afe and 9e053ba.

📒 Files selected for processing (10)

src/ingester/fetchers/grpc.rs (7 hunks)
src/ingester/fetchers/mod.rs (4 hunks)
src/ingester/fetchers/poller.rs (2 hunks)
src/ingester/gap/mod.rs (1 hunks)
src/ingester/gap/rewind.rs (1 hunks)
src/ingester/gap/sequences.rs (1 hunks)
src/ingester/gap/treetype_seq.rs (1 hunks)
src/ingester/indexer/mod.rs (4 hunks)
src/ingester/mod.rs (2 hunks)
src/main.rs (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/ingester/fetchers/mod.rs
src/ingester/fetchers/grpc.rs
src/ingester/indexer/mod.rs

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/ingester/gap/treetype_seq.rs (1)

src/ingester/parser/indexer_events.rs (1)

seq (40-45)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (12)

src/ingester/gap/rewind.rs (1)

55-63: LGTM! Clear and correct gap handling logic.

The function correctly identifies the earliest valid slot for rewinding, properly filtering out zero slots from initialization. The implementation ensures all missing data will be captured by using the minimum valid slot.

src/ingester/fetchers/poller.rs (1)

74-88: Well-structured rewind state management.

The implementation correctly handles all state updates during rewind:

Cache clearing prevents stale block data

Slot adjustments ensure proper restart position

Flag-based control flow is clean and maintainable

src/main.rs (1)

299-300: Clean RewindController integration.

The RewindController is properly integrated into the indexing pipeline with clear separation of concerns between the controller (for sending commands) and receiver (for listening to commands).

Also applies to: 313-313, 322-323

src/ingester/gap/treetype_seq.rs (1)

30-68: Well-designed helper methods with proper Option handling.

The helper methods demonstrate good functional programming practices:

Safe pattern matching prevents type confusion

Graceful handling of None cases with defaults

Clear separation of concerns between different tree types

src/ingester/gap/mod.rs (1)

67-93: Clear state retrieval logic with appropriate precedence.

The function correctly prioritizes tree pubkey over queue pubkey and provides helpful debug logging throughout. The fallback to default values is well-handled.

src/ingester/mod.rs (3)

32-85: Well-structured gap detection and rewind integration!

The enhanced derive_block_state_update function properly integrates sequence tracking, gap detection, and rewind control. The error handling ensures that gap-triggered rewinds are propagated correctly to stop further processing.

144-193: Excellent implementation of tree filtering and batch processing!

The updated index_block_batch function efficiently pre-filters blocks when a tree filter is specified, properly handles empty batches, and maintains good metrics tracking. The integration with rewind control is seamless.

195-223: Proper error handling for rewind-triggered errors!

The updated retry logic correctly distinguishes between gap-triggered rewind errors (which should be propagated) and other transient errors (which should be retried). This ensures the rewind mechanism works as intended.

src/ingester/gap/sequences.rs (4)

24-130: Comprehensive sequence extraction with proper validation!

The extract_state_update_sequences method thoroughly extracts sequences from all StateUpdate fields and includes important validation logic to detect and skip invalid AddressV1 trees in batch operations. The error logging provides good visibility into data quality issues.

132-226: Excellent concurrent state management!

The update_sequence_state method demonstrates proper lock handling by:

Using appropriate error handling for lock acquisition failures

Dropping the read lock before acquiring the write lock to prevent deadlocks

Applying all updates atomically

The use of specialized constructors for different tree types ensures proper state transitions.

228-313: Thorough gap detection across all sequence types!

The detect_all_sequence_gaps method systematically checks all sequence collections and properly maps event types to their corresponding field types. The aggregation of all gaps provides a complete picture for the rewind mechanism.

315-374: Robust gap detection logic with proper edge case handling!

The detect_sequence_gaps_with_metadata method correctly:

Handles initialization cases where no previous state exists

Detects gaps both from the current state and between consecutive sequences

Includes full metadata (slots and signatures) for accurate gap filling

Properly sorts sequences before processing

coderabbitai · 2025-07-31T09:22:31Z

src/ingester/gap/mod.rs

+    let state = match SEQUENCE_STATE.read() {
+        Ok(state) => state,
+        Err(e) => {
+            debug!("Failed to acquire sequence state read lock: {}", e);
+            return TreeTypeSeq::default();
+        }
+    };


⚠️ Potential issue

Reconsider error handling strategy for poisoned locks.

A poisoned RwLock indicates a panic occurred in another thread while holding the lock, which is a critical error. Silently returning a default value could mask serious issues and lead to incorrect gap detection.

Consider either:

Propagating the error up to the caller

Logging at error level instead of debug

Attempting to recover the lock with into_inner()

let state = match SEQUENCE_STATE.read() { Ok(state) => state, Err(e) => { - debug!("Failed to acquire sequence state read lock: {}", e); - return TreeTypeSeq::default(); + error!("Critical: Failed to acquire sequence state read lock: {}", e); + // Option 1: panic to fail fast + panic!("Sequence state lock poisoned: {}", e); + // Option 2: attempt recovery (if appropriate) + // e.into_inner() } };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let state = match SEQUENCE_STATE.read() {

Ok(state) => state,

Err(e) => {

debug!("Failed to acquire sequence state read lock: {}", e);

return TreeTypeSeq::default();

}

};

let state = match SEQUENCE_STATE.read() {

Ok(state) => state,

Err(e) => {

error!("Critical: Failed to acquire sequence state read lock: {}", e);

// Option 1: panic to fail fast

panic!("Sequence state lock poisoned: {}", e);

// Option 2: attempt recovery (if appropriate)

// e.into_inner()

}

};

🤖 Prompt for AI Agents

In src/ingester/gap/mod.rs around lines 59 to 65, the current error handling for a poisoned RwLock silently returns a default value, which can mask critical issues. Instead, change the error handling to either propagate the error to the caller, log the error at error level rather than debug, or attempt to recover the lock using into_inner(). Choose one of these approaches to properly handle the poisoned lock scenario and avoid hiding serious concurrency problems.

coderabbitai · 2025-07-31T09:22:31Z

src/main.rs

+            let tree_filter = args.tree.as_ref().map(|tree_str| {
+                tree_str
+                    .parse::<Pubkey>()
+                    .expect("Invalid tree pubkey format")
+            });


🛠️ Refactor suggestion

⚠️ Potential issue

Eliminate code duplication and improve error handling.

The tree pubkey parsing logic is duplicated. Additionally, using expect() causes a panic on invalid input, which provides poor user experience.

Extract the parsing logic into a helper function with proper error handling:

+fn parse_tree_filter(tree_str: Option<&String>) -> Result<Option<Pubkey>, String> { + tree_str + .map(|s| s.parse::<Pubkey>().map_err(|e| format!("Invalid tree pubkey '{}': {}", s, e))) + .transpose() +} // Replace lines 242-246: - let tree_filter = args.tree.as_ref().map(|tree_str| { - tree_str - .parse::<Pubkey>() - .expect("Invalid tree pubkey format") - }); + let tree_filter = match parse_tree_filter(args.tree.as_ref()) { + Ok(filter) => filter, + Err(e) => { + error!("{}", e); + std::process::exit(1); + } + }; // Replace lines 302-306: - let tree_filter = args.tree.as_ref().map(|tree_str| { - tree_str - .parse::<Pubkey>() - .expect("Invalid tree pubkey format") - }); + let tree_filter = match parse_tree_filter(args.tree.as_ref()) { + Ok(filter) => filter, + Err(e) => { + error!("{}", e); + std::process::exit(1); + } + };

Also applies to: 302-306

🤖 Prompt for AI Agents

In src/main.rs around lines 242 to 246 and also lines 302 to 306, the code duplicates the tree pubkey parsing logic and uses expect(), which panics on invalid input. Refactor by extracting the parsing logic into a single helper function that returns a Result or Option with proper error handling. Replace all duplicated parsing code with calls to this helper, and handle errors gracefully by returning user-friendly error messages instead of panicking.

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e053ba and 0bdcfa3.

📒 Files selected for processing (1)

src/ingester/gap/sequences.rs (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (3)

src/ingester/gap/sequences.rs (3)

12-21: Well-structured data model for sequence tracking.

The use of composite keys and HashMap collections provides efficient lookups and appropriate grouping of sequence data by tree and type.

76-92: Good defensive programming with tree type validation.

The validation logic correctly ensures only AddressV2 trees are processed in batch_new_addresses, with appropriate error logging and skipping of invalid entries.

332-391: Robust gap detection implementation.

The method properly handles edge cases including initialization state, overflow protection with saturating_add, and maintains full metadata for gap analysis.

coderabbitai · 2025-07-31T18:55:24Z

src/ingester/gap/sequences.rs

+        let current_state = match SEQUENCE_STATE.read() {
+            Ok(state) => state,
+            Err(e) => {
+                debug!("Failed to acquire read lock for sequence state: {}", e);
+                return;
+            }
+        };


🛠️ Refactor suggestion

Consider propagating lock acquisition failures.

The current implementation silently returns on lock failures, which could hide synchronization issues. Consider returning a Result to allow callers to handle these failures appropriately.

- pub fn update_sequence_state(&self) { + pub fn update_sequence_state(&self) -> Result<(), String> { let current_state = match SEQUENCE_STATE.read() { Ok(state) => state, Err(e) => { - debug!("Failed to acquire read lock for sequence state: {}", e); - return; + let err_msg = format!("Failed to acquire read lock for sequence state: {}", e); + debug!("{}", err_msg); + return Err(err_msg); } };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let current_state = match SEQUENCE_STATE.read() {

Ok(state) => state,

Err(e) => {

debug!("Failed to acquire read lock for sequence state: {}", e);

return;

}

};

pub fn update_sequence_state(&self) -> Result<(), String> {

let current_state = match SEQUENCE_STATE.read() {

Ok(state) => state,

Err(e) => {

let err_msg = format!("Failed to acquire read lock for sequence state: {}", e);

debug!("{}", err_msg);

return Err(err_msg);

}

};

// …rest of the method…

}

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 139 to 145, the code currently returns silently when failing to acquire the read lock on SEQUENCE_STATE, which can hide synchronization issues. Modify the function to return a Result type and propagate the lock acquisition error instead of returning early. Update the caller to handle this Result accordingly, allowing proper error handling of lock failures.

coderabbitai · 2025-07-31T18:55:24Z

src/ingester/gap/sequences.rs

+                    match info.tree_type {
+                        light_compressed_account::TreeType::AddressV1 => {
+                            updates.insert(tree_str, TreeTypeSeq::AddressV1(max_entry.clone()));
+                        }
+                        tree_type => {
+                            tracing::error!(
+                                "Unhandled tree type {:?} for tree {} in indexed_tree_seqs",
+                                tree_type,
+                                tree_str
+                            );
+                        }
+                    }
+                }


💡 Verification agent

🧩 Analysis chain

Verify tree type handling for indexed_tree_seqs.

Currently only AddressV1 is handled for indexed_tree_seqs, while other tree types log errors. Please verify if this is intentional or if StateV1 should also be handled here.

🏁 Script executed:

#!/bin/bash # Search for other usages of indexed_tree_seqs to understand expected tree types rg -A 5 -B 5 "indexed_tree_seqs" --type rust # Check what tree types are used with indexed_merkle_tree_updates ast-grep --pattern 'indexed_merkle_tree_updates'

Length of output: 10125

🏁 Script executed:

#!/bin/bash # Inspect QUEUE_TREE_MAPPING to determine which TreeType variants it covers rg -n "QUEUE_TREE_MAPPING" -C 5 --type rust

Length of output: 5235

🏁 Script executed:

#!/bin/bash # Display QUEUE_TREE_MAPPING contents to see all tree types included sed -n '1,200p' src/ingester/parser/tree_info.rs | sed -n '35,120p'

Length of output: 3748

🏁 Script executed:

#!/bin/bash # List all TreeType assignments in the QUEUE_TREE_MAPPING initializer rg -n "TreeType::" -C 5 src/ingester/parser/tree_info.rs

Length of output: 2897

🏁 Script executed:

#!/bin/bash # Find definition and type of indexed_merkle_tree_updates in parser/state_update.rs rg -n "indexed_merkle_tree_updates" -C 10 src/ingester/parser/state_update.rs

Length of output: 2393

Handle StateV1 updates in indexed_tree_seqs

The indexed_tree_seqs loop currently only matches TreeType::AddressV1. Since QUEUE_TREE_MAPPING includes legacy StateV1 trees and indexed_merkle_tree_updates carries v1 Merkle‐leaf updates for both state and address trees, you should add a branch for TreeType::StateV1 so those sequences aren’t dropped.

Locations:

src/ingester/gap/sequences.rs lines 153–158

Suggested change:

match info.tree_type { light_compressed_account::TreeType::AddressV1 => { updates.insert(tree_str.clone(), TreeTypeSeq::AddressV1(max_entry.clone())); } + light_compressed_account::TreeType::StateV1 => { + updates.insert(tree_str.clone(), TreeTypeSeq::StateV1(max_entry.clone())); + } tree_type => { tracing::error!( "Unhandled tree type {:?} for tree {} in indexed_tree_seqs", tree_type, tree_str ); } }

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 153 to 158, the match on info.tree_type only handles TreeType::AddressV1, causing StateV1 updates to be dropped. Add a match branch for TreeType::StateV1 that inserts the corresponding TreeTypeSeq::StateV1 with max_entry into updates, similar to the AddressV1 case, to ensure StateV1 sequences are properly included.

coderabbitai · 2025-07-31T18:55:25Z

src/ingester/gap/sequences.rs

+            let field_type = match event_type {
+                1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,
+                2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,
+                3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
+                _ => continue,
+            };


🛠️ Refactor suggestion

Replace magic numbers with named constants.

The event type mapping uses hard-coded values that should be defined as constants for better maintainability.

Define constants at the module level:

const EVENT_TYPE_APPEND: u8 = 1; const EVENT_TYPE_NULLIFY: u8 = 2; const EVENT_TYPE_ADDRESS_APPEND: u8 = 3;

Then update the match:

let field_type = match event_type { - 1 => StateUpdateFieldType::BatchMerkleTreeEventAppend, - 2 => StateUpdateFieldType::BatchMerkleTreeEventNullify, - 3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend, + EVENT_TYPE_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend, + EVENT_TYPE_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify, + EVENT_TYPE_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend, _ => continue, };

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 302 to 307, the match statement uses hard-coded numeric literals for event_type values. Define module-level constants for these values (e.g., EVENT_TYPE_APPEND = 1, EVENT_TYPE_NULLIFY = 2, EVENT_TYPE_ADDRESS_APPEND = 3) and replace the numeric literals in the match arms with these constants to improve code clarity and maintainability.

coderabbitai · 2025-07-31T18:55:25Z

src/ingester/gap/sequences.rs

+fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {
+    match event {
+        MerkleTreeEvent::BatchAppend(_) => 1,
+        MerkleTreeEvent::BatchNullify(_) => 2,
+        MerkleTreeEvent::BatchAddressAppend(_) => 3,
+        _ => 0, // Other event types we don't care about
+    }
+}


🛠️ Refactor suggestion

Use the same constants for consistency.

This function should use the same named constants suggested earlier for event type mapping.

fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 { match event { - MerkleTreeEvent::BatchAppend(_) => 1, - MerkleTreeEvent::BatchNullify(_) => 2, - MerkleTreeEvent::BatchAddressAppend(_) => 3, + MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_APPEND, + MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_NULLIFY, + MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_ADDRESS_APPEND, _ => 0, // Other event types we don't care about } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {

match event {

MerkleTreeEvent::BatchAppend(_) => 1,

MerkleTreeEvent::BatchNullify(_) => 2,

MerkleTreeEvent::BatchAddressAppend(_) => 3,

_ => 0, // Other event types we don't care about

}

}

fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {

match event {

MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_APPEND,

MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_NULLIFY,

MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_ADDRESS_APPEND,

_ => 0, // Other event types we don't care about

}

}

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 393 to 400, the function merkle_event_to_type_id uses hardcoded numeric literals for event type IDs. Replace these numeric literals with the previously defined named constants for event types to ensure consistency and maintainability across the codebase.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

src/ingester/mod.rs (1)
1-2: Avoid blocking the async runtime: use tokio::time::sleep

This async function currently uses std::thread::sleep, which blocks the executor thread. Switch to tokio::time::sleep and await it.
-use std::thread::sleep;
-use std::time::Duration;
+use tokio::time::{sleep, Duration};
...
-                sleep(Duration::from_secs(1));
+                sleep(Duration::from_secs(1)).await;
Also applies to: 217-217

♻️ Duplicate comments (2)

src/ingester/mod.rs (2)

88-89: Eliminate duplicate derive_block_state_update call

derive_block_state_update is called twice with identical parameters; the first result is discarded. Cache once and reuse.

-    derive_block_state_update(block, None, None)?;
-    persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?;
+    let state_update = derive_block_state_update(block, None, None)?;
+    persist_state_update(&txn, state_update).await?;

127-140: Tree filter misses inner instructions; scan inner_instructions too

Current block_contains_tree only checks outer_instruction.accounts. This can miss relevant matches contained in inner instructions.

 fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool {
     for tx in &block.transactions {
         for instruction_group in &tx.instruction_groups {
             if instruction_group
                 .outer_instruction
                 .accounts
                 .contains(tree_filter)
             {
                 return true;
             }
+            // Also check inner instructions
+            for inner in &instruction_group.inner_instructions {
+                if inner.accounts.contains(tree_filter) {
+                    return true;
+                }
+            }
         }
     }
     false
 }

🧹 Nitpick comments (3)

src/ingester/mod.rs (3)
55-77: When gaps are detected without a rewind controller, decide on explicit handling and add metrics

Right now, gaps are logged, but if rewind_controller is None, execution proceeds silently. At minimum, emit a metric; optionally, surface an error to upstream so callers can decide what to do.
-    let gaps = sequences.detect_all_sequence_gaps();
+    let gaps = sequences.detect_all_sequence_gaps();
+    // Metric for visibility into gap occurrences
+    metric! {
+        statsd_count!("gaps_detected", gaps.len() as i64);
+    }
     if !gaps.is_empty() {
         tracing::warn!(
             "Gaps detected in block {} sequences: {gaps:?}",
             block.metadata.slot
         );

         // Request rewind if controller is available
         if let Some(controller) = rewind_controller {
+            metric! { statsd_count!("rewind_requests", gaps.len() as i64); }
             if let Err(e) = controller.request_rewind_for_gaps(&gaps) {
                 tracing::error!(
                     "Failed to request rewind for gaps in block {}: {}",
                     block.metadata.slot,
                     e
                 );
+                metric! { statsd_count!("rewind_request_failures", gaps.len() as i64); }
                 return Err(IngesterError::CustomError(
                     "Gap detection triggered rewind failure".to_string(),
                 ));
             }
             // Return early after requesting rewind - don't continue processing
             return Err(IngesterError::GapDetectedRewind);
-        }
+        } else {
+            // Option: escalate instead of continuing silently
+            // return Err(IngesterError::CustomError("Gap detected but no rewind controller configured".into()));
+        }
     }
Please confirm the desired behavior in the no-controller path. If index_block is used only in contexts where gaps are acceptable, keep as-is but keep the metrics.

79-81: Update sequence state only after successful DB commit (consider refactor)

sequences.update_sequence_state() mutates global state before persistence. If persistence fails, in-memory state advances while DB state doesn’t. Consider deferring the state update until after successful commit, which likely entails returning sequences from derive_block_state_update and applying them post-persist.

High-level approach:

Change derive_block_state_update to return (StateUpdate, StateUpdateSequences).

Collect sequences across filtered_blocks alongside state updates.

After persist succeeds and tx.commit().await?, apply sequences.update_sequence_state() once.

This avoids transient divergence between in-memory sequence state and DB.

Also applies to: 172-178, 179-179, 189-191

179-179: Unify call style for persist_state_update for consistency

You import persist_state_update at Line 14 but call it via the module path at Line 179. Prefer one style throughout; using the imported function is concise.
-    persist::persist_state_update(&tx, StateUpdate::merge_updates(state_updates)).await?;
+    persist_state_update(&tx, StateUpdate::merge_updates(state_updates)).await?;
Also applies to: 14-14

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bdcfa3 and 11d16da.

📒 Files selected for processing (4)

src/ingester/error.rs (1 hunks)
src/ingester/fetchers/poller.rs (2 hunks)
src/ingester/indexer/mod.rs (4 hunks)
src/ingester/mod.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/ingester/error.rs
src/ingester/fetchers/poller.rs
src/ingester/indexer/mod.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (2)

src/ingester/mod.rs (2)

199-201: Guard against empty batches to avoid unwrap() panics

first()/last().unwrap() will panic if block_batch is empty. Add an early return.
 ) -> Result<(), IngesterError> {
-    loop {
+    if block_batch.is_empty() {
+        return Ok(());
+    }
+    loop {
         match index_block_batch(db, &block_batch, rewind_controller, tree_filter).await {
Also applies to: 209-210

82-82: Verify merge_updates behavior on empty input

With a tree filter, a block may yield zero state updates. Ensure StateUpdate::merge_updates(state_updates) handles an empty vector safely (no panic) and returns a sensible “no-op” update. If not, short-circuit when empty.

Potential guard (if StateUpdate implements Default):
if state_updates.is_empty() {
    return Ok(StateUpdate::default());
}

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (4)

src/ingester/gap/sequences.rs (4)

411-418: Use the same constants in merkle_event_to_type_id for consistency.

Keep definitions centralized and avoid literal drift.

     match event {
-        MerkleTreeEvent::BatchAppend(_) => 1,
-        MerkleTreeEvent::BatchNullify(_) => 2,
-        MerkleTreeEvent::BatchAddressAppend(_) => 3,
+        MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_APPEND,
+        MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_NULLIFY,
+        MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_ADDRESS_APPEND,
         _ => 0, // Other event types we don't care about
     }

138-146: Don’t silently swallow lock acquisition failures; propagate errors.

Return a Result so callers can handle lock poisoning or contention explicitly rather than returning early.

-    pub fn update_sequence_state(&self) {
-        let current_state = match SEQUENCE_STATE.read() {
-            Ok(state) => state,
-            Err(e) => {
-                debug!("Failed to acquire read lock for sequence state: {}", e);
-                return;
-            }
-        };
+    pub fn update_sequence_state(&self) -> Result<(), String> {
+        let mut state = match SEQUENCE_STATE.write() {
+            Ok(state) => state,
+            Err(e) => {
+                let err_msg = format!("Failed to acquire write lock for sequence state: {}", e);
+                debug!("{}", err_msg);
+                return Err(err_msg);
+            }
+        };

Follow-on changes are in later suggestions within this function. Remember to update callers to handle Result<()>.

150-169: Handle StateV1 for indexed_tree_seqs to avoid dropping sequences.

Currently only AddressV1 is handled; StateV1 entries get logged as “Unhandled” and discarded.

                 if let Some(info) = QUEUE_TREE_MAPPING.get(&tree_str) {
                     match info.tree_type {
                         light_compressed_account::TreeType::AddressV1 => {
                             updates.insert(tree_str, TreeTypeSeq::AddressV1(max_entry.clone()));
                         }
+                        light_compressed_account::TreeType::StateV1 => {
+                            updates.insert(tree_str, TreeTypeSeq::StateV1(max_entry.clone()));
+                        }
                         tree_type => {
                             tracing::error!(
                                 "Unhandled tree type {:?} for tree {} in indexed_tree_seqs",
                                 tree_type,
                                 tree_str
                             );
                         }
                     }
                 }

302-307: Replace magic numbers for event types with named constants.

Use named constants to avoid brittle literals and ensure consistency with merkle_event_to_type_id.

-            let field_type = match event_type {
-                1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,
-                2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,
-                3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
+            let field_type = match event_type {
+                EVENT_TYPE_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend,
+                EVENT_TYPE_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify,
+                EVENT_TYPE_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
                 _ => continue,
             };

Add these constants near the top of the file (outside of this hunk):

const EVENT_TYPE_APPEND: u8 = 1;
const EVENT_TYPE_NULLIFY: u8 = 2;
const EVENT_TYPE_ADDRESS_APPEND: u8 = 3;

Optional: add a small helper fn event_type_to_field_type(u8) -> Option<StateUpdateFieldType> to DRY both places.

🧹 Nitpick comments (2)

src/ingester/gap/sequences.rs (2)

55-66: Verify unnecessary Pubkey conversion.

If context.tree_pubkey can be directly converted to Pubkey (e.g., implements Into/From), prefer that over to_bytes() + new_from_array to avoid an extra allocation and reduce noise.

If supported by the type, change to:

-            let tree = Pubkey::new_from_array(context.tree_pubkey.to_bytes());
+            let tree = Pubkey::from(context.tree_pubkey);

68-102: Add a log when QUEUE_TREE_MAPPING has no entry.

When QUEUE_TREE_MAPPING.get(&tree_str) returns None, we silently accept the address entry. Consider logging at least a warn-level message to surface potentially misconfigured or unexpected trees.

-            if let Some(info) = QUEUE_TREE_MAPPING.get(&tree_str) {
+            if let Some(info) = QUEUE_TREE_MAPPING.get(&tree_str) {
                 // batch_new_addresses should only contain AddressV2 trees
                 if info.tree_type != light_compressed_account::TreeType::AddressV2 {
                     tracing::error!(
                         "{:?} wrong tree {tree_str} found in batch_new_addresses \
                         Only AddressV2 trees should be in batch new address operations. \
                         queue_index: {}, slot: {}, signature: {}",
                         info.tree_type,
                         address.queue_index,
                         slot,
                         signature
                     );
                     // Skip this invalid data
                     continue;
                 }
             }
+            else {
+                tracing::warn!(
+                    "Tree {} not found in QUEUE_TREE_MAPPING for batch_new_addresses; \
+                     queue_index: {}, slot: {}, signature: {}",
+                    tree_str, address.queue_index, slot, signature
+                );
+            }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 718f407 and 4db2da2.

📒 Files selected for processing (2)

src/ingester/fetchers/poller.rs (2 hunks)
src/ingester/gap/sequences.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/ingester/fetchers/poller.rs

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/ingester/gap/sequences.rs (6)

src/ingester/gap/mod.rs (1)

get_current_sequence_state (67-107)

src/ingester/parser/indexer_events.rs (2)

tree_pubkey (34-39)

seq (40-45)

src/ingester/gap/rewind.rs (1)

new (24-27)

src/main.rs (2)

tree_str (248-249)

tree_str (314-315)

src/ingester/parser/tx_event_parser_v2.rs (1)

events (31-114)

src/ingester/gap/treetype_seq.rs (2)

new_address_v2_with_output (61-67)

new_state_v2_with_output (51-58)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (2)

src/ingester/gap/sequences.rs (2)

31-41: Good: sequence entries carry slot and signature context.

Including slot and signature alongside sequence numbers will materially improve diagnostics when reporting gaps. LGTM.

171-177: Confirm nullifications are always StateV1.

All nullifications are currently funneled into StateV1. If V2 nullifications exist (or can in the future), this will misclassify and bias detection/state.

Would you like me to scan the codebase for V2 nullification producers/usages and open a follow-up if found?

coderabbitai · 2025-08-08T20:28:25Z

src/ingester/gap/sequences.rs

+        // Process batch address queue indexes (AddressV2)
+        for (tree_pubkey, entries) in &self.batch_address_queue_indexes {
+            if let Some(max_entry) = entries.iter().max_by_key(|e| e.sequence) {
+                let tree_str = tree_pubkey.to_string();
+                debug!(
+                    "Updating batch_address_queue_indexes for tree: {}, sequence: {}",
+                    tree_str, max_entry.sequence
+                );
+
+                updates.insert(
+                    tree_str.clone(),
+                    TreeTypeSeq::new_address_v2_with_output(
+                        current_state.get(&tree_str),
+                        max_entry.clone(),
+                    ),
+                );
+            }
+        }


🛠️ Refactor suggestion

⚠️ Potential issue

Fix potential lost update due to TOCTOU between read and write of SEQUENCE_STATE.

You read current_state, compute V2 updates that depend on current state, drop the read lock, and later acquire write lock to persist. Concurrent writers could race in between, causing stale merges and lost fields (e.g., AddressV2 input preserved from an outdated snapshot).

Hold a single write lock throughout update computation and use it both for reads and writes. This maintains consistency without sacrificing correctness.

Apply the diffs below:

Replace use of read lock with a write lock at the start and remove the later write-lock block.

- let current_state = match SEQUENCE_STATE.read() { - Ok(state) => state, - Err(e) => { - debug!("Failed to acquire read lock for sequence state: {}", e); - return; - } - }; + let mut state = match SEQUENCE_STATE.write() { + Ok(state) => state, + Err(e) => { + debug!("Failed to acquire write lock for sequence state: {}", e); + return; + } + };

Use the up-to-date state when building V2 updates:

updates.insert( tree_str.clone(), TreeTypeSeq::new_address_v2_with_output( - current_state.get(&tree_str), + state.get(&tree_str), max_entry.clone(), ), );

updates.insert( tree_str.clone(), TreeTypeSeq::new_state_v2_with_output( - current_state.get(&tree_str), + state.get(&tree_str), max_entry.clone(), ), );

Remove the explicit read-lock drop and the second write-lock acquisition, and insert directly:

- // Drop read lock before acquiring write lock - drop(current_state); - - // Apply all updates atomically - if !updates.is_empty() { - match SEQUENCE_STATE.write() { - Ok(mut state) => { - for (key, value) in updates { - state.insert(key, value); - } - } - Err(e) => { - debug!("Failed to acquire write lock for sequence state: {}", e); - } - } - } + // Apply all updates atomically under the same write lock + if !updates.is_empty() { + for (key, value) in updates { + state.insert(key, value); + } + } + return ();

Note: If you adopt the earlier Result return, adjust the final return accordingly.

Also applies to: 202-224, 228-243

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 179 to 196, the code currently reads from current_state using a read lock, computes updates, then later acquires a write lock to persist changes, causing a TOCTOU race condition. To fix this, acquire a single write lock at the start of the update process and use it for both reading and writing state throughout the update computation. Replace all read lock usages with this write lock, remove any explicit read lock drops and subsequent write lock acquisitions, and ensure updates are built using the most current state from the write lock. Apply the same pattern to lines 202-224 and 228-243 as well.

coderabbitai · 2025-08-08T20:28:25Z

src/ingester/gap/sequences.rs

+    fn detect_sequence_gaps_with_metadata(
+        sequences: &[SequenceEntry],
+        tree_pubkey: Option<Pubkey>,
+        queue_pubkey: Option<Pubkey>,
+        field_type: StateUpdateFieldType,
+    ) -> Vec<SequenceGap> {
+        if sequences.len() < 2 {
+            return Vec::new();
+        }
+
+        let mut sorted_sequences = sequences.to_vec();
+        sorted_sequences.sort_by_key(|entry| entry.sequence);
+        let mut gaps = Vec::new();
+
+        let start_seq = get_current_sequence_state(tree_pubkey, queue_pubkey, &field_type);
+        let (unpacked_start_seq, start_entry) = start_seq.extract_sequence_info(&field_type);
+
+        // Skip gap detection for tree initialization (when unpacked_start_seq == 0)
+        // because there's no previous sequence to compare against
+        // Also skip if unpacked_start_seq is u64::MAX (no state found)
+        if unpacked_start_seq > 0 && unpacked_start_seq != u64::MAX {
+            // Check for any missing sequences between global state and the minimum sequence in this block
+            let min_seq_in_block = sorted_sequences[0].sequence;
+
+            // Check if there's a gap between the global state and the sequences in this block
+            // A gap exists if the minimum sequence in the block is more than 1 away from global state
+            // AND the missing sequences are not present anywhere in this block
+            if min_seq_in_block > unpacked_start_seq.saturating_add(1) {
+                // Check if ALL missing sequences are present in this block
+                let mut has_real_gap = false;
+                for missing_seq in (unpacked_start_seq + 1)..min_seq_in_block {
+                    let found = sorted_sequences.iter().any(|e| e.sequence == missing_seq);
+                    if !found {
+                        has_real_gap = true;
+                        break;
+                    }
+                }
+
+                if has_real_gap {
+                    let (before_slot, before_signature) = if let Some(entry) = start_entry {
+                        (entry.slot, entry.signature)
+                    } else {
+                        (0, String::new())
+                    };
+
+                    gaps.push(SequenceGap {
+                        before_slot,
+                        after_slot: sorted_sequences[0].slot,
+                        before_signature,
+                        after_signature: sorted_sequences[0].signature.clone(),
+                        tree_pubkey,
+                        field_type: field_type.clone(),
+                    });
+                }
+            }
+        }
+
+        for i in 1..sorted_sequences.len() {
+            let prev_entry = &sorted_sequences[i - 1];
+            let curr_entry = &sorted_sequences[i];
+
+            if curr_entry.sequence - prev_entry.sequence > 1 {
+                gaps.push(SequenceGap {
+                    before_slot: prev_entry.slot,
+                    after_slot: curr_entry.slot,
+                    before_signature: prev_entry.signature.clone(),
+                    after_signature: curr_entry.signature.clone(),
+                    tree_pubkey,
+                    field_type: field_type.clone(),
+                });
+            }
+        }
+
+        gaps
+    }
+}


🛠️ Refactor suggestion

Add unit tests for boundary cases in gap detection.

Recommend adding tests for:

Single-entry block with prior state gap (covered by the previous fix).

Duplicate sequences within the same block (ensure no false gaps).

Non-monotonic sequences within a block that still contain missing numbers (ensure real gap detection).

I can provide focused tests using minimal StateUpdateSequences construction to cover these.

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 334 to 409, the detect_sequence_gaps_with_metadata function lacks unit tests for important boundary cases. Add unit tests to cover these scenarios: a single-entry block with a prior state gap to verify gap detection after a previous sequence, blocks containing duplicate sequences to ensure no false gaps are reported, and blocks with non-monotonic sequences that still have missing numbers to confirm real gaps are detected. Use minimal StateUpdateSequences constructions in the tests to isolate and validate each case effectively.

coderabbitai · 2025-08-08T20:28:25Z

src/ingester/gap/sequences.rs

+        if sequences.len() < 2 {
+            return Vec::new();
+        }
+
+        let mut sorted_sequences = sequences.to_vec();
+        sorted_sequences.sort_by_key(|entry| entry.sequence);
+        let mut gaps = Vec::new();


⚠️ Potential issue

Bug: early-return prevents detecting gaps vs previous state when only one sequence is present in a block.

With a single sequence in sequences, we still need to check for a gap between the global state and the block’s minimum sequence. The current < 2 early return skips that entirely.

- if sequences.len() < 2 { + if sequences.is_empty() { return Vec::new(); }

Add a unit test for a block containing exactly one sequence where there is a gap between global state and the block’s min sequence to prevent regression. I can draft this if helpful.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if sequences.len() < 2 {

return Vec::new();

}

let mut sorted_sequences = sequences.to_vec();

sorted_sequences.sort_by_key(|entry| entry.sequence);

let mut gaps = Vec::new();

if sequences.is_empty() {

return Vec::new();

}

let mut sorted_sequences = sequences.to_vec();

sorted_sequences.sort_by_key(|entry| entry.sequence);

let mut gaps = Vec::new();

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 340 to 346, the early return when sequences.len() is less than 2 prevents checking for gaps between the global state and the block's minimum sequence if only one sequence is present. Remove or modify this early return to allow gap detection in the single-sequence case. Additionally, add a unit test for a block with exactly one sequence where a gap exists between the global state and the block's minimum sequence to ensure this case is handled and prevent regressions.

Co-authored-by: Copilot <[email protected]>

refactor: add a dedicated error variant for gap detection

The new `--disable-gap-detection` flag allows users to opt out of the sequence gap detection and rewinding behavior, simplifying the indexing process in cases where it's not needed.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/ingester/persist/persisted_batch_event.rs (1)

195-203: Address queue range is off-by-one and lacks lower bound; may skip or over-delete rows

Current logic selects/deletes rows with QueueIndex < (new_next_index - 1), which excludes the last intended element (new_next_index - 1) and also lacks a lower bound (old_next_index), risking unintended deletes on older rows. Align with the [old, new) convention used elsewhere.

Apply these diffs to both the read and delete paths:

-    let last_queue_index = batch_address_append_event.new_next_index as i64 - 1;
-    let addresses = address_queues::Entity::find()
-        .filter(address_queues::Column::QueueIndex.lt(last_queue_index).and(
-            address_queues::Column::Tree.eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
-        ))
+    let addresses = address_queues::Entity::find()
+        .filter(
+            address_queues::Column::QueueIndex
+                .gte(batch_address_append_event.old_next_index as i64)
+                .and(address_queues::Column::QueueIndex.lt(
+                    batch_address_append_event.new_next_index as i64,
+                ))
+                .and(address_queues::Column::Tree.eq(
+                    batch_address_append_event.merkle_tree_pubkey.to_vec(),
+                )),
+        )
         .order_by_asc(address_queues::Column::QueueIndex)
         .all(txn)
         .await?;

-    address_queues::Entity::delete_many()
-        .filter(address_queues::Column::QueueIndex.lt(last_queue_index).and(
-            address_queues::Column::Tree.eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
-        ))
+    address_queues::Entity::delete_many()
+        .filter(
+            address_queues::Column::QueueIndex
+                .gte(batch_address_append_event.old_next_index as i64)
+                .and(address_queues::Column::QueueIndex.lt(
+                    batch_address_append_event.new_next_index as i64,
+                ))
+                .and(address_queues::Column::Tree.eq(
+                    batch_address_append_event.merkle_tree_pubkey.to_vec(),
+                )),
+        )
         .exec(txn)
         .await?;

Also applies to: 220-225

src/ingester/indexer/mod.rs (1)

1-1: Don’t block the async runtime; replace std::thread::sleep with tokio::sleep.

Blocking sleep in an async function can stall other tasks, especially under error conditions. Prefer tokio::time::sleep.
-use std::{sync::Arc, thread::sleep, time::Duration};
+use std::sync::Arc;
+use tokio::time::{sleep, Duration};

♻️ Duplicate comments (8)

src/api/method/get_multiple_compressed_accounts.rs (1)

151-155: Unify error handling style with V2 (use the Err(...)? idiom).

V1 now returns a proper error (good change), but it diverges from V2’s concise style. For consistency and readability, mirror the V2 approach.

Apply this diff:
-        _ => {
-            return Err(PhotonApiError::ValidationError(
-                "Either hashes or addresses must be provided".to_string(),
-            ));
-        }
+        _ => Err(PhotonApiError::ValidationError(
+            "Either hashes or addresses must be provided".to_string(),
+        ))?,

tests/integration_tests/snapshot_test_utils.rs (2)

244-250: Use a temporary directory for round-trip tests to avoid path conflicts.

Hardcoding under target/ can collide under parallel runs and might not be writable in some environments.

-    let roundtrip_dir = std::path::PathBuf::from("target")
-        .join("test_snapshots")
-        .join("roundtrip");
-    std::fs::create_dir_all(&roundtrip_dir)?;
-    let roundtrip_dir_str = roundtrip_dir.to_str().unwrap().to_string();
-    let target_adapter = Arc::new(DirectoryAdapter::from_local_directory(roundtrip_dir_str));
+    let tempdir = tempfile::tempdir().context("Failed to create temp directory")?;
+    let target_adapter = Arc::new(DirectoryAdapter::from_local_directory(
+        tempdir.path().to_string_lossy().to_string(),
+    ));

Note: Remember to add tempfile as a dev-dependency.

171-177: Make snapshot file writes atomic.

Direct writes risk partial/corrupt files on failures. Write to a temp file and rename atomically.

-    let data_len = snapshot_data.len();
-    std::fs::write(&snapshot_path, snapshot_data)?;
+    let data_len = snapshot_data.len();
+    let tmp_path = snapshot_path.with_extension("tmp");
+    std::fs::write(&tmp_path, snapshot_data)
+        .context("Failed to write snapshot data")?;
+    std::fs::rename(&tmp_path, &snapshot_path)
+        .context("Failed to finalize snapshot file")?;

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (2)

11-18: Replace magic numbers with named constants (or an enum).

You still map MerkleTreeEvent variants to raw numbers. This reduces readability and risks divergence if values change.

Apply this diff to use named constants in the mapping:

 fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {
     match event {
-        MerkleTreeEvent::BatchAppend(_) => 1,
-        MerkleTreeEvent::BatchNullify(_) => 2,
-        MerkleTreeEvent::BatchAddressAppend(_) => 3,
+        MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_BATCH_APPEND,
+        MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_BATCH_NULLIFY,
+        MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_BATCH_ADDRESS_APPEND,
         _ => 0, // Other event types we don't care about
     }
 }

Add these constants near the top of the file (outside the selected range):

const EVENT_TYPE_BATCH_APPEND: u8 = 1;
const EVENT_TYPE_BATCH_NULLIFY: u8 = 2;
const EVENT_TYPE_BATCH_ADDRESS_APPEND: u8 = 3;

243-250: Use the same named constants for reverse mapping.

Mirror the constants in the reverse event_type → field_type mapping to avoid duplication of magic numbers.

     let field_type = match event_type {
-        1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,
-        2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,
-        3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
+        EVENT_TYPE_BATCH_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend,
+        EVENT_TYPE_BATCH_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify,
+        EVENT_TYPE_BATCH_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
         _ => continue,
     };

src/main.rs (1)

247-251: Deduplicate and harden tree pubkey parsing (avoid expect).

Tree pubkey parsing is duplicated and panics on invalid input. Extract a helper that returns Result<Option>, log a friendly error, and exit gracefully.

Apply these diffs in-place to the two call sites:

-            let tree_filter = args.tree.as_ref().map(|tree_str| {
-                tree_str
-                    .parse::<Pubkey>()
-                    .expect("Invalid tree pubkey format")
-            });
+            let tree_filter = parse_tree_filter(args.tree.as_ref()).unwrap_or_else(|e| {
+                error!("{}", e);
+                std::process::exit(1);
+            });

-            let tree_filter = args.tree.as_ref().map(|tree_str| {
-                tree_str
-                    .parse::<Pubkey>()
-                    .expect("Invalid tree pubkey format")
-            });
+            let tree_filter = parse_tree_filter(args.tree.as_ref()).unwrap_or_else(|e| {
+                error!("{}", e);
+                std::process::exit(1);
+            });

Add this helper outside the selected ranges (e.g., below Args or near other small helpers):

fn parse_tree_filter(tree_str: Option<&String>) -> Result<Option<Pubkey>, String> {
    tree_str
        .map(|s| s.parse::<Pubkey>().map_err(|e| format!("Invalid tree pubkey '{}': {}", s, e)))
        .transpose()
}

Also applies to: 313-317

src/ingester/mod.rs (2)

88-90: Eliminate duplicate derive_block_state_update call in index_block

derive_block_state_update is invoked twice with identical parameters; the first result is discarded. This doubles transaction parsing work unnecessarily.

Apply:

-    derive_block_state_update(block, None, None)?;
-    persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?;
+    let state_update = derive_block_state_update(block, None, None)?;
+    persist_state_update(&txn, state_update).await?;

127-140: Tree filter misses inner instructions (can skip relevant blocks)

block_contains_tree only checks outer_instruction.accounts, ignoring inner_instructions. With --tree filtering, this can incorrectly skip blocks that reference the tree only in inner instructions, causing missed updates.

Fix:

 fn block_contains_tree(block: &BlockInfo, tree_filter: &solana_pubkey::Pubkey) -> bool {
     for tx in &block.transactions {
         for instruction_group in &tx.instruction_groups {
-            if instruction_group
-                .outer_instruction
-                .accounts
-                .contains(tree_filter)
-            {
-                return true;
-            }
+            if instruction_group.outer_instruction.accounts.contains(tree_filter) {
+                return true;
+            }
+            for inner in &instruction_group.inner_instructions {
+                if inner.accounts.contains(tree_filter) {
+                    return true;
+                }
+            }
         }
     }
     false
 }

This matches the parser’s logic (src/ingester/parser/mod.rs) and prevents false negatives.

🧹 Nitpick comments (21)

src/ingester/persist/persisted_batch_event.rs (3)
135-137: Per-tree scoping on nullifier selection is correct; confirm queue index semantics and add indexing

Adding AND accounts.Tree = batch_nullify_event.merkle_tree_pubkey to the selection prevents cross-tree contamination. Please confirm that NullifierQueueIndex is per-tree (not global). If it’s global, this filter could inadvertently skip rows in the range.

Performance: these range scans will benefit from a composite index. Consider a partial index to keep updates cheap:

PostgreSQL example:
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_accounts_tree_nullifier_queue_idx
ON accounts (tree, nullifier_queue_index) WHERE nullifier_queue_index IS NOT NULL;

180-182: Mirrored per-tree filter on update looks good; consider verifying rows affected

The update’s filter now matches the read’s constraints, which keeps read/update sets consistent. To catch drift or malformed batches early, consider asserting that the number of updated rows equals the number of selected accounts for this batch (and log/err if not). This helps detect partial updates or stale queue indices.

Apply this diff to capture and validate affected rows for the nullify update (illustrative; adjust error handling to your conventions):
-    let query = accounts::Entity::update_many()
+    let query = accounts::Entity::update_many()
         .col_expr(
             accounts::Column::NullifierQueueIndex,
             Expr::value(Option::<i64>::None),
         )
         .col_expr(accounts::Column::NullifiedInTree, Expr::value(true))
         .filter(
             accounts::Column::NullifierQueueIndex
                 .gte(batch_nullify_event.old_next_index)
                 .and(accounts::Column::NullifierQueueIndex.lt(batch_nullify_event.new_next_index))
                 .and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec())),
         )
         .build(txn.get_database_backend());
-    txn.execute(query).await?;
+    let res = txn.execute(query).await?;
+    let expected = accounts.len() as u64;
+    let updated = res.rows_affected();
+    if updated != expected {
+        return Err(IngesterError::ParserError(format!(
+            "Nullify update rows_affected mismatch: expected {}, got {} (tree={:?}, range=[{}, {}))",
+            expected,
+            updated,
+            batch_nullify_event.merkle_tree_pubkey,
+            batch_nullify_event.old_next_index,
+            batch_nullify_event.new_next_index
+        )));
+    }
106-116: Minor style consistency: use exec() instead of build()+execute()

Elsewhere you use .exec(txn) directly (e.g., address queue deletion). For consistency and readability, consider using .exec(txn) here too unless you specifically need the built statement.

Apply this diff:
-    let query = accounts::Entity::update_many()
+    let query = accounts::Entity::update_many()
         .col_expr(accounts::Column::InOutputQueue, Expr::value(false))
         .filter(
             accounts::Column::LeafIndex
                 .gte(batch_append_event.old_next_index as i64)
                 .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
                 .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec())),
         )
-        .build(txn.get_database_backend());
-    txn.execute(query).await?;
+        .exec(txn)
+        .await?;
src/ingester/parser/mod.rs (3)
75-79: Avoid cloning the entire TransactionInfo.

No need to clone tx. Iterate by reference to reduce allocations.
-    for instruction_group in tx.clone().instruction_groups {
+    for instruction_group in &tx.instruction_groups {
-        let mut ordered_instructions = Vec::new();
-        ordered_instructions.push(instruction_group.outer_instruction.clone());
-        ordered_instructions.extend(instruction_group.inner_instructions.clone());
+        let mut ordered_instructions = Vec::new();
+        ordered_instructions.push(instruction_group.outer_instruction.clone());
+        ordered_instructions.extend(instruction_group.inner_instructions.clone());
Note: Only the for-loop header changes; the rest stays the same since you already clone instructions as needed.

159-159: Drop the unnecessary clone when merging updates.

state_updates is not used after this point; pass by value to avoid cloning.
-    let mut state_update = StateUpdate::merge_updates(state_updates.clone());
+    let mut state_update = StateUpdate::merge_updates(state_updates);
169-173: Tree post-filter looks correct; consider preserving transactions when partially relevant.

Current logic clears transactions and account_transactions only when all filtered collections are empty, which is sensible. Optionally, you could prune account_transactions that no longer reference retained accounts/trees while keeping the transaction if some data remains; this would reduce noise downstream.

I can propose a targeted filter for account_transactions keyed by the remaining account/tree set if you want to tighten this further.
src/tools/analyze_snapshot.rs (4)
55-57: Pass the tree filter to the parser when provided to reduce parsing cost.

You already parse without a filter and then check trees. If a target tree is supplied, pass it to parse_transaction to skip irrelevant transactions upfront.
-            match parse_transaction(tx, block.metadata.slot, None) {
+            match parse_transaction(tx, block.metadata.slot, target_tree_pubkey)
Note: target_tree_pubkey is an Option already.

76-77: Prefer explicit Pubkey construction for clarity.

Using Pubkey::new_from_array(*tree_bytes) avoids relying on a From impl that may be less obvious to readers.
-                            let tree = solana_pubkey::Pubkey::from(*tree_bytes);
+                            let tree = solana_pubkey::Pubkey::new_from_array(*tree_bytes);
127-130: Guard percentage calculation against division by zero.

If the snapshot happens to contain zero transactions, you’ll print NaN/Inf.
-    println!(
-        "Compression transactions: {} ({:.2}%)",
-        compression_transactions,
-        (compression_transactions as f64 / total_transactions as f64) * 100.0
-    );
+    if total_transactions > 0 {
+        println!(
+            "Compression transactions: {} ({:.2}%)",
+            compression_transactions,
+            (compression_transactions as f64 / total_transactions as f64) * 100.0
+        );
+    } else {
+        println!("Compression transactions: 0 (0.00%)");
+    }
Also consider a similar guard for the “Skip % of blocks” calculation below.

1-158: Optional: add streaming-based implementation example for memory safety.

If you want, I can provide a streaming rewrite that maintains identical outputs but avoids Vec flattening.

Would you like me to push a follow-up PR to implement the streaming approach and add guards for empty totals?
tests/integration_tests/snapshot_test_utils.rs (1)
368-374: Test comment mismatches parameter semantics.

The parameter target_slot is a slot boundary, not a count of transactions. Update the comment to avoid confusion.
-            10, // Fetch 10 compression transactions
+            10, // Fetch compression transactions down to slot 10
src/ingester/gap/treetype_seq.rs (2)
15-19: Defaulting to StateV1 may hide intent for unknown tree types.

Using StateV1 as the Default variant can mask uninitialized or unknown types. Consider implementing Default for each use-site explicitly, or a neutral “Unknown” variant to make mismatches explicit.

If you’d like, I can sketch an Unknown variant and update extract_sequence_info to handle it cleanly.

60-67: Method naming: “with_output” vs. AddressV2’s second field (batch_event_entry).

new_address_v2_with_output sets the second tuple element, documented as batch_event_entry. The name “with_output” is misleading; consider renaming to reflect the semantics (e.g., new_address_v2_with_batch_event).
-    pub(crate) fn new_address_v2_with_output(
+    pub(crate) fn new_address_v2_with_batch_event(
         current: Option<&TreeTypeSeq>,
         output_entry: SequenceEntry,
     ) -> TreeTypeSeq {
         let input_entry = Self::get_or_default_address_v2_input(current);
         TreeTypeSeq::AddressV2(input_entry, output_entry)
     }
Note: This is a public (crate) API rename; adjust call sites accordingly if you adopt it.
tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1)
41-44: Public underscore-prefixed field reads oddly.

SequenceGap._tree_type_string is public but prefixed with an underscore (usually for unused locals). Consider renaming to tree_type_string for clarity.
-    pub _tree_type_string: Option<String>, // Tree type string (for indexed tree updates)
+    pub tree_type_string: Option<String>, // Tree type string (for indexed tree updates)
Note: update the single construction site in detect_sequence_gaps_with_metadata accordingly.
src/ingester/persist/persisted_indexed_merkle_tree.rs (2)
84-85: Avoid persisting placeholder signatures downstream.

Zeroed 64-byte signatures mark synthetic elements (zeroeth/top). If these flow into state_tree_histories, they pollute transaction_signature with invalid values. Prefer skipping these records when inserting address entries.

Before applying, confirm whether state_tree_histories.transaction_signature is used to link back to transactions for gap reports or debugging. If yes, skipping synthetic entries is preferable.

Apply this diff where address_tree_history_models are built:
-        let address_tree_history_models = chunk
-            .iter()
-            .map(
+        let address_tree_history_models = chunk
+            .iter()
+            // Skip synthetic elements that carry a zeroed placeholder signature
+            .filter(|x| x.signature.as_ref() != &[0u8; 64])
+            .map(
                 |x| crate::dao::generated::state_tree_histories::ActiveModel {
                     tree: Set(x.tree.to_bytes().to_vec()),
                     seq: Set(x.seq as i64),
                     leaf_idx: Set(x.leaf.index as i64),
                     transaction_signature: Set(Into::<[u8; 64]>::into(x.signature).to_vec()),
                 },
             )
             .collect::<Vec<_>>();
Also applies to: 129-130

238-265: Confirm intent: inserting all indexed leaf updates (not just AddressV1) into state_tree_histories.

Comment says “Add address tree entries…”, but the code inserts all chunk elements regardless of tree_type. If state trees were already persisted into state_tree_histories elsewhere, this may cause benign duplicates (mitigated by ON CONFLICT) but wasted work. If the goal is to unify by ensuring AddressV1 are present too, optionally filter to AddressV1.

If you intend to filter to AddressV1:
-        let address_tree_history_models = chunk
-            .iter()
+        let address_tree_history_models = chunk
+            .iter()
+            .filter(|x| matches!(x.tree_type, TreeType::AddressV1))
             .map(
                 |x| crate::dao::generated::state_tree_histories::ActiveModel {
                     tree: Set(x.tree.to_bytes().to_vec()),
                     seq: Set(x.seq as i64),
                     leaf_idx: Set(x.leaf.index as i64),
                     transaction_signature: Set(Into::<[u8; 64]>::into(x.signature).to_vec()),
                 },
             )
             .collect::<Vec<_>>();
If full unification is desired (i.e., both state and address trees here), consider updating the comment to reflect that to avoid confusion.
src/ingester/fetchers/poller.rs (1)
72-80: Severity level for rewind logging.

Rewinds are expected in gap-detection flows. Consider downgrading from error to info/warn to avoid noisy alerting.
-                                log::error!("Rewinding block stream to {}: {}", to_slot, reason);
+                                log::warn!("Rewinding block stream to {}: {}", to_slot, reason);
src/ingester/mod.rs (4)
55-77: Instrument gap detection with metrics and improve observability

Gaps are logged but not metered. Adding counters helps alerting and SLOs (e.g., unexpected rewinds).

Consider:
     let gaps = sequences.detect_all_sequence_gaps();
     if !gaps.is_empty() {
+        metric! {
+            // Count gaps detected; either 1 per block or total gaps; choose policy. Here: total gaps found.
+            statsd_count!("gaps_detected", gaps.len() as i64);
+        }
         tracing::warn!(
             "Gaps detected in block {} sequences: {gaps:?}",
             block.metadata.slot
         );

         // Request rewind if controller is available
         if let Some(controller) = rewind_controller {
             if let Err(e) = controller.request_rewind_for_gaps(&gaps) {
                 tracing::error!(
                     "Failed to request rewind for gaps in block {}: {}",
                     block.metadata.slot,
                     e
                 );
                 return Err(IngesterError::CustomError(
                     "Gap detection triggered rewind failure".to_string(),
                 ));
             }
+            metric! {
+                statsd_count!("rewinds_requested", 1);
+            }
             // Return early after requesting rewind - don't continue processing
             return Err(IngesterError::GapDetectedRewind);
         }
     }
142-147: Prefer slice over &Vec in function signature

Accepting &[BlockInfo] is more idiomatic and flexible than &Vec<BlockInfo>. Call sites passing &Vec<T> still coerce to slices.
-pub async fn index_block_batch(
-    db: &DatabaseConnection,
-    block_batch: &Vec<BlockInfo>,
+pub async fn index_block_batch(
+    db: &DatabaseConnection,
+    block_batch: &[BlockInfo],
     rewind_controller: Option<&RewindController>,
     tree_filter: Option<solana_pubkey::Pubkey>,
 ) -> Result<(), IngesterError> {
184-188: Unify logging macros (mix of tracing and log)

This module uses both tracing::* and log::*. Prefer a single facade (ideally tracing) for consistent formatting and context propagation.
-    log::info!(
+    tracing::info!(
         "Indexed {} blocks, skipped {} blocks",
         blocks_len,
         block_batch.len() - blocks_len
     );
37-37: Minor: pre-allocate state_updates capacity

A tiny win: initialize with capacity to avoid potential reallocations proportional to transactions.
-    let mut state_updates: Vec<StateUpdate> = Vec::new();
+    let mut state_updates: Vec<StateUpdate> = Vec::with_capacity(block.transactions.len());

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4db2da2 and 7823882.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (33)

.gitignore (1 hunks)
Cargo.toml (2 hunks)
src/api/method/get_multiple_compressed_accounts.rs (1 hunks)
src/api/method/get_transaction_with_compression_info.rs (2 hunks)
src/ingester/error.rs (1 hunks)
src/ingester/fetchers/grpc.rs (7 hunks)
src/ingester/fetchers/mod.rs (4 hunks)
src/ingester/fetchers/poller.rs (2 hunks)
src/ingester/gap/mod.rs (1 hunks)
src/ingester/gap/rewind.rs (1 hunks)
src/ingester/gap/sequences.rs (1 hunks)
src/ingester/gap/treetype_seq.rs (1 hunks)
src/ingester/indexer/mod.rs (4 hunks)
src/ingester/mod.rs (2 hunks)
src/ingester/parser/merkle_tree_events_parser.rs (3 hunks)
src/ingester/parser/mod.rs (3 hunks)
src/ingester/parser/state_update.rs (2 hunks)
src/ingester/parser/tx_event_parser_v2.rs (2 hunks)
src/ingester/persist/persisted_batch_event.rs (2 hunks)
src/ingester/persist/persisted_indexed_merkle_tree.rs (4 hunks)
src/main.rs (8 hunks)
src/snapshot/mod.rs (1 hunks)
src/snapshot/snapshotter/main.rs (1 hunks)
src/tools/analyze_snapshot.rs (1 hunks)
tests/integration_tests/e2e_tests.rs (1 hunks)
tests/integration_tests/main.rs (1 hunks)
tests/integration_tests/mock_tests.rs (1 hunks)
tests/integration_tests/snapshot_test_utils.rs (1 hunks)
tests/integration_tests/snapshot_tests.rs (2 hunks)
tests/integration_tests/test_v1_address_tree_gap_filler.rs (1 hunks)
tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1 hunks)
tests/integration_tests/utils.rs (2 hunks)
tests/integration_tests/zeroeth_element_fix_test.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (21)

src/ingester/error.rs
src/ingester/parser/state_update.rs
src/ingester/gap/rewind.rs
tests/integration_tests/e2e_tests.rs
src/ingester/fetchers/grpc.rs
tests/integration_tests/main.rs
tests/integration_tests/mock_tests.rs
tests/integration_tests/utils.rs
tests/integration_tests/test_v1_address_tree_gap_filler.rs
Cargo.toml
src/api/method/get_transaction_with_compression_info.rs
src/ingester/fetchers/mod.rs
.gitignore
src/ingester/gap/mod.rs
src/snapshot/snapshotter/main.rs
src/ingester/parser/merkle_tree_events_parser.rs
tests/integration_tests/snapshot_tests.rs
src/ingester/parser/tx_event_parser_v2.rs
src/snapshot/mod.rs
src/ingester/gap/sequences.rs
tests/integration_tests/zeroeth_element_fix_test.rs

🧰 Additional context used

🧬 Code Graph Analysis (9)

tests/integration_tests/snapshot_test_utils.rs (3)

src/ingester/parser/mod.rs (1)

get_compression_program_id (22-25)

src/ingester/typedefs/block_info.rs (1)

parse_ui_confirmed_blocked (55-93)

src/snapshot/mod.rs (6)

create_snapshot_from_byte_stream (632-684)

load_block_stream_from_directory_adapter (568-630)

load_byte_stream_from_directory_adapter (534-566)

from_local_directory (265-267)

get_snapshot_files_with_metadata (385-406)

is_compression_transaction (364-376)

src/ingester/fetchers/poller.rs (1)

src/ingester/gap/rewind.rs (1)

new (24-27)

src/tools/analyze_snapshot.rs (1)

src/snapshot/mod.rs (2)

load_block_stream_from_directory_adapter (568-630)

from_local_directory (265-267)

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (3)

src/ingester/parser/mod.rs (1)

parse_transaction (42-175)

src/snapshot/mod.rs (2)

load_block_stream_from_directory_adapter (568-630)

from_local_directory (265-267)

src/ingester/parser/indexer_events.rs (2)

tree_pubkey (34-39)

seq (40-45)

src/ingester/gap/treetype_seq.rs (1)

src/ingester/parser/indexer_events.rs (1)

seq (40-45)

src/ingester/parser/mod.rs (2)

src/ingester/fetchers/grpc.rs (1)

parse_transaction (306-384)

src/ingester/parser/state_update.rs (1)

new (106-108)

src/ingester/indexer/mod.rs (1)

src/ingester/mod.rs (1)

index_block_batch_with_infinite_retries (193-221)

src/main.rs (3)

src/monitor/mod.rs (1)

continously_monitor_photon (51-85)

src/ingester/indexer/mod.rs (1)

index_block_stream (49-126)

src/ingester/gap/rewind.rs (1)

new (24-27)

src/ingester/mod.rs (2)

src/ingester/fetchers/grpc.rs (1)

parse_transaction (306-384)

src/ingester/parser/mod.rs (1)

parse_transaction (42-175)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (8)

src/ingester/parser/mod.rs (2)

42-71: Early tree filter: good optimization; consider documenting potential false negatives.

The early account scan is a pragmatic win to avoid heavy parsing when a tree_filter is set. Note that some flows involving derived/indirect accounts might not list the tree account explicitly, potentially yielding false negatives. A short note in the function docs would help future maintainers.

Would you like me to add a brief doc comment warning about this trade-off?

183-221: Filter predicates align with field shapes; good coverage.

Retains by tree across out_accounts, indexed updates, batch events, new addresses, and nullifications. The mix of Pubkey and [u8; 32] handling is consistent.

src/ingester/gap/treetype_seq.rs (1)

82-154: extract_sequence_info: solid contract and defensive returns.

Clear separation of invalid (u64::MAX), uninitialized (0), and valid states. The mapping across field types looks consistent with the TreeTypeSeq variants.

tests/integration_tests/test_v1_address_tree_sequence_consistency.rs (1)

312-316: Good fix: no more hardcoded local path.

Using TEST_SNAPSHOT_PATH with a sensible default eliminates CI fragility and dev-machine coupling.

src/ingester/persist/persisted_indexed_merkle_tree.rs (1)

11-11: LGTM: explicit Signature import.

Pulling in solana_sdk::signature::Signature matches the new field and DB persistence use.

src/ingester/indexer/mod.rs (1)

55-57: Wiring rewind_controller and tree_filter looks correct.

The parameters are threaded through to index_block_batch_with_infinite_retries and used only for control flow/logging here. Good separation of concerns.

src/main.rs (1)

304-311: RewindController wiring is clean.

Optional creation based on disable_gap_detection with proper passing into BlockStreamConfig. Nice.

src/ingester/mod.rs (1)

199-207: Correct propagation for gap-triggered rewind — LGTM

Propagating IngesterError::GapDetectedRewind to callers instead of retrying is appropriate for letting fetchers reposition and refill gaps.

coderabbitai · 2025-08-19T15:26:44Z

src/ingester/fetchers/poller.rs

+                            RewindCommand::Rewind { to_slot, reason } => {
+                                log::error!("Rewinding block stream to {}: {}", to_slot, reason);
+                                // Clear cached blocks
+                                block_cache.clear();
+                                // Reset positions
+                                last_indexed_slot = to_slot - 1;
+                                current_start_slot = to_slot;
+                                rewind_occurred = true;
+                                log::info!("Cleared cache, restarting from slot {}", current_start_slot);


⚠️ Potential issue

Underflow bug: rewinding to slot 0 causes u64 underflow.

Setting last_indexed_slot = to_slot - 1 will wrap if to_slot == 0. Use saturating_sub.

- last_indexed_slot = to_slot - 1; + last_indexed_slot = to_slot.saturating_sub(1);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

RewindCommand::Rewind { to_slot, reason } => {

log::error!("Rewinding block stream to {}: {}", to_slot, reason);

// Clear cached blocks

block_cache.clear();

// Reset positions

last_indexed_slot = to_slot - 1;

current_start_slot = to_slot;

rewind_occurred = true;

log::info!("Cleared cache, restarting from slot {}", current_start_slot);

RewindCommand::Rewind { to_slot, reason } => {

log::error!("Rewinding block stream to {}: {}", to_slot, reason);

// Clear cached blocks

block_cache.clear();

// Reset positions

last_indexed_slot = to_slot.saturating_sub(1);

current_start_slot = to_slot;

rewind_occurred = true;

log::info!("Cleared cache, restarting from slot {}", current_start_slot);

🤖 Prompt for AI Agents

In src/ingester/fetchers/poller.rs around lines 71 to 79, assigning last_indexed_slot = to_slot - 1 can underflow when to_slot == 0; change this to use a saturating subtraction (e.g., last_indexed_slot = to_slot.saturating_sub(1)) or explicitly handle the zero case so last_indexed_slot never wraps, keep current_start_slot = to_slot and maintain rewind_occurred/logging as-is.

coderabbitai · 2025-08-19T15:26:44Z

src/ingester/indexer/mod.rs

+                    continue;
+                } else {
+                    log::error!("Unexpected error in block processing: {}", e);
+                    sleep(Duration::from_secs(1));
                }


🛠️ Refactor suggestion

Switch to async sleep on error path.

Continue using non-blocking sleep when recovering from unexpected errors.

- log::error!("Unexpected error in block processing: {}", e); - sleep(Duration::from_secs(1)); + log::error!("Unexpected error in block processing: {}", e); + sleep(Duration::from_secs(1)).await;

🤖 Prompt for AI Agents

In src/ingester/indexer/mod.rs around lines 118 to 122, the error recovery path uses the blocking std::thread::sleep which blocks the async runtime; replace it with a non-blocking async sleep by using tokio::time::sleep(Duration::from_secs(1)).await (or the crate's async runtime equivalent), add the necessary use/import for tokio::time::Duration or tokio::time::sleep, and ensure the surrounding function is async so you can await the sleep; remove the blocking std::thread::sleep call.

coderabbitai · 2025-08-19T15:26:44Z

src/ingester/mod.rs

+    // Update sequence state with latest observed sequences
+    sequences.update_sequence_state();
+


💡 Verification agent

🧩 Analysis chain

Sequence state is updated before DB commit — risks inconsistency on failure

sequences.update_sequence_state() mutates global/latest-sequence state within derive_block_state_update, i.e., before persistence succeeds and the transaction is committed. If persist_state_update or tx.commit() fails, the in-memory/global sequence state advances while the DB does not, potentially hiding future gaps or causing inconsistent rewind behavior.

Proposed direction:

Defer sequence-state advancement until after successful persistence (ideally after tx.commit()).

Practical approach: make derive_block_state_update return both (StateUpdate, StateUpdateSequences) (or a minimal “latest” snapshot). After persist_state_update returns Ok and the transaction commits, apply update_sequence_state().

Would you like me to draft a concrete patch to thread the sequences back to index_block_batch and update post-commit? Additionally, please verify how StateUpdateSequences::update_sequence_state is implemented (in-memory vs. persistent). If it’s in-memory, early updates are especially risky.

To verify the implementation impact, you can run:

🏁 Script executed:

#!/bin/bash # Inspect how sequence state is stored/updated and where else it's called rg -n -C3 'update_sequence_state\(' --type rust rg -n -C3 'StateUpdateSequences' --type rust rg -n -C3 'SEQUENCE_STATE' --type rust

Length of output: 16728

🏁 Script executed:

#!/bin/bash # Show derive_block_state_update definition with context rg -n -C5 'fn derive_block_state_update' --type rust src/ingester/mod.rs

Length of output: 427

🏁 Script executed:

#!/bin/bash # Find index_block_batch definition and context rg -n -C5 'fn index_block_batch' --type rust src/ingester

Length of output: 1137

🏁 Script executed:

#!/bin/bash # Show usage of derive_block_state_update within index_block_batch rg -n -C5 'derive_block_state_update' --type rust src/ingester/mod.rs

Length of output: 1304

🏁 Script executed:

#!/bin/bash # Display the body of derive_block_state_update including the update_sequence_state call sed -n '32,120p' src/ingester/mod.rs

Length of output: 3575

Defer in-memory sequence advancement until after successful DB commit

The call to sequences.update_sequence_state() currently lives inside derive_block_state_update, i.e. before persist_state_update and tx.commit(). Since update_sequence_state() writes only to the in-memory SEQUENCE_STATE, a failure in persistence or commit will leave our global sequence tracker ahead of what’s in the database—masking gaps or causing rewind logic to misbehave.

Recommended refactor:

Change derive_block_state_update to return both StateUpdate and its associated StateUpdateSequences.

In index_block (and index_block_batch), apply the DB writes and commit first, then call sequences.update_sequence_state().

Key locations:

src/ingester/mod.rs: the end of fn derive_block_state_update (remove the early sequences.update_sequence_state()).

src/ingester/mod.rs: pub async fn index_block (and its batch variant)—thread the returned sequences through, commit the transaction, then update the global state.

Example diff sketch:

--- a/src/ingester/mod.rs +++ b/src/ingester/mod.rs @@ fn derive_block_state_update(...) -> Result<StateUpdate, IngesterError> { - // Update sequence state with latest observed sequences - sequences.update_sequence_state(); - Ok(StateUpdate::merge_updates(state_updates)) + // Defer applying to global state until after DB commit: + Ok((StateUpdate::merge_updates(state_updates), sequences)) } pub async fn index_block(...) -> Result<(), IngesterError> { let txn = db.begin().await?; @@ - derive_block_state_update(block, None, None)?; - persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?; - txn.commit().await?; + let (state_update, sequences) = derive_block_state_update(block, None, None)?; + persist_state_update(&txn, state_update).await?; + txn.commit().await?; + // Now that the transaction has committed successfully, advance the in-memory state: + sequences.update_sequence_state(); Ok(()) }

This ensures that our global sequence tracker only moves forward once the database is in sync.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/ingester/mod.rs around lines 79 to 81, remove the early call to sequences.update_sequence_state() from the end of fn derive_block_state_update and change that function to return a tuple of (StateUpdate, StateUpdateSequences) (or the existing StateUpdate plus its associated sequences object) so callers receive the sequences without applying them; then in pub async fn index_block (and the batch variant) accept/receive the returned sequences, perform persist_state_update and tx.commit() first, and only after the database commit call sequences.update_sequence_state() to advance the in-memory SEQUENCE_STATE—thread the sequences through the call chain similarly in index_block_batch so the global tracker is updated only on successful DB commit.

coderabbitai · 2025-08-19T15:26:44Z

src/tools/analyze_snapshot.rs

+    let blocks: Vec<_> = block_stream
+        .collect::<Vec<_>>()
+        .await
+        .into_iter()
+        .flatten()
+        .collect();
+


🛠️ Refactor suggestion

Avoid loading the entire snapshot into memory; stream-process blocks.

Collecting all blocks defeats the benefit of streaming and can blow memory on large snapshots. Iterate the stream and update counters on the fly.

Example approach (outside the selected range for context):

Replace the collect/flatten with a while-let loop over the stream, accumulating stats per chunk.

This avoids materializing Vec<Vec> and the final Vec.

🤖 Prompt for AI Agents

In src/tools/analyze_snapshot.rs around lines 35 to 41, the code currently collects the entire block_stream into a Vec and flattens it, which loads the whole snapshot into memory; instead, iterate the stream and process each chunk as it arrives (e.g., use StreamExt::next or a while-let loop over the stream, await each chunk, iterate its items and update your counters/accumulators in place), remove the collect/flatten and any code that assumes a final Vec<BlockInfo>, and ensure types reflect on-the-fly aggregation so memory usage stays bounded for large snapshots.

coderabbitai · 2025-08-19T15:26:44Z

tests/integration_tests/snapshot_test_utils.rs

+    loop {
+        let config = GetConfirmedSignaturesForAddress2Config {
+            before,
+            until: None,
+            limit: None, // No limit - fetch as many as possible per batch
+            commitment: None,
+        };
+
+        let compression_program_id =
+            solana_sdk::pubkey::Pubkey::new_from_array(get_compression_program_id().to_bytes());
+        println!(
+            "Fetching signatures for compression program: {}",
+            compression_program_id
+        );
+        let batch = client
+            .get_signatures_for_address_with_config(&compression_program_id, config)
+            .await
+            .context("Failed to fetch signatures for compression program")?;
+
+        println!("Fetched {} signatures in this batch", batch.len());
+
+        let mut reached_target_slot = false;
+        for sig_info in &batch {
+            // Check if we've reached the target slot
+            if sig_info.slot < target_slot {
+                reached_target_slot = true;
+                break;
+            }
+
+            // Skip failed transactions
+            if sig_info.err.is_some() {
+                continue;
+            }
+
+            let signature =
+                Signature::from_str(&sig_info.signature).context("Failed to parse signature")?;
+            signatures.push(signature);
+            signature_to_slot_map.insert(signature, sig_info.slot);
+        }
+
+        if reached_target_slot {
+            // Stop when no more signatures or reached target slot
+            break;
+        }
+
+        before = batch
+            .last()
+            .map(|sig| Signature::from_str(&sig.signature).unwrap());
+    }


⚠️ Potential issue

Prevent infinite loop when signature batch is empty.

If RPC returns an empty batch before reaching target_slot, before stays None and the loop repeats the same request indefinitely. Break on empty batch.

let batch = client .get_signatures_for_address_with_config(&compression_program_id, config) .await .context("Failed to fetch signatures for compression program")?; println!("Fetched {} signatures in this batch", batch.len()); + if batch.is_empty() { + // No more data available + break; + }

Optional: also consider a safety max-iterations guard.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

loop {

let config = GetConfirmedSignaturesForAddress2Config {

before,

until: None,

limit: None, // No limit - fetch as many as possible per batch

commitment: None,

};

let compression_program_id =

solana_sdk::pubkey::Pubkey::new_from_array(get_compression_program_id().to_bytes());

println!(

"Fetching signatures for compression program: {}",

compression_program_id

);

let batch = client

.get_signatures_for_address_with_config(&compression_program_id, config)

.await

.context("Failed to fetch signatures for compression program")?;

println!("Fetched {} signatures in this batch", batch.len());

let mut reached_target_slot = false;

for sig_info in &batch {

// Check if we've reached the target slot

if sig_info.slot < target_slot {

reached_target_slot = true;

break;

}

// Skip failed transactions

if sig_info.err.is_some() {

continue;

}

let signature =

Signature::from_str(&sig_info.signature).context("Failed to parse signature")?;

signatures.push(signature);

signature_to_slot_map.insert(signature, sig_info.slot);

}

if reached_target_slot {

// Stop when no more signatures or reached target slot

break;

}

before = batch

.last()

.map(|sig| Signature::from_str(&sig.signature).unwrap());

}

loop {

let config = GetConfirmedSignaturesForAddress2Config {

before,

until: None,

limit: None, // No limit - fetch as many as possible per batch

commitment: None,

};

let compression_program_id =

solana_sdk::pubkey::Pubkey::new_from_array(get_compression_program_id().to_bytes());

println!(

"Fetching signatures for compression program: {}",

compression_program_id

);

let batch = client

.get_signatures_for_address_with_config(&compression_program_id, config)

.await

.context("Failed to fetch signatures for compression program")?;

println!("Fetched {} signatures in this batch", batch.len());

if batch.is_empty() {

// No more data available

break;

}

let mut reached_target_slot = false;

for sig_info in &batch {

// Check if we've reached the target slot

if sig_info.slot < target_slot {

reached_target_slot = true;

break;

}

// Skip failed transactions

if sig_info.err.is_some() {

continue;

}

let signature =

Signature::from_str(&sig_info.signature).context("Failed to parse signature")?;

signatures.push(signature);

signature_to_slot_map.insert(signature, sig_info.slot);

}

if reached_target_slot {

// Stop when no more signatures or reached target slot

break;

}

before = batch

.last()

.map(|sig| Signature::from_str(&sig.signature).unwrap());

}

🤖 Prompt for AI Agents

In tests/integration_tests/snapshot_test_utils.rs around lines 305 to 353, the loop calling get_signatures_for_address_with_config can spin forever if the RPC returns an empty batch because `before` remains None; update the loop to detect an empty batch and break out (or return an error) when batch.is_empty(), and also optionally add a safety max-iterations counter that breaks with a clear error after N iterations to avoid infinite loops during tests.

- Adds sequential index validation for both operations - Checks for already processed batches during re-indexing - Improves handling of address batch indices and queue state - Retains nullifier queue indices for re-indexing support - Makes temporary file creation more robust with fallback

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/ingester/persist/persisted_batch_event.rs (3)
153-164: Potential overflow: sequence_number cast to u32 without bounds checks.

BatchEvent.sequence_number looks like a 64-bit counter in many systems. The unchecked cast can truncate.

Apply a fallible conversion to avoid silent truncation:
-                seq: Some(batch_append_event.sequence_number as u32),
+                seq: Some(
+                    u32::try_from(batch_append_event.sequence_number)
+                        .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?
+                ),
If LeafNode.seq must be u32, failing fast here is safer than silently corrupting metadata.

424-432: Guard against sequence_number overflow in address append and precompute seq.

multi_append can’t accept a fallible conversion inline; compute seq first.
-    multi_append(
+    let seq = u32::try_from(batch_address_append_event.sequence_number)
+        .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?;
+    multi_append(
         txn,
         address_values,
         batch_address_append_event.merkle_tree_pubkey.to_vec(),
         DEFAULT_BATCH_ADDRESS_TREE_HEIGHT + 1,
-        Some(batch_address_append_event.sequence_number as u32),
+        Some(seq),
     )
153-164: Enforce safe u64→u32 conversion for sequence_number

The BatchEvent.sequence_number field is declared as a u64 in src/ingester/parser/indexer_events.rs:178. Casting it to u32 with as u32 will silently truncate values ≥ 2³², leading to data corruption if ever out of range. You must either:
Replace as u32 with a checked conversion:
seq: Some(
  u32::try_from(batch_append_event.sequence_number)
    .map_err(|_| IngesterError::ParserError(
      "sequence_number overflowed u32".into(),
    ))?
),
Or, if it’s guaranteed by upstream logic to never exceed u32::MAX, document that invariant on the BatchEvent.sequence_number declaration and use try_from(...).unwrap() (or a safe helper).
Please update all occurrences in this file:

• src/ingester/persist/persisted_batch_event.rs at lines 153–164
• Same file at lines 283–295
• Same file at lines 424–432

♻️ Duplicate comments (7)

src/ingester/gap/sequences.rs (7)

139-146: Don’t swallow lock acquisition failures; return a Result.

Returning early hides synchronization issues. Prefer propagating errors so callers can decide to retry or trigger a rewind.

-    pub fn update_sequence_state(&self) {
+    pub fn update_sequence_state(&self) -> Result<(), String> {
         let current_state = match SEQUENCE_STATE.read() {
             Ok(state) => state,
             Err(e) => {
-                debug!("Failed to acquire read lock for sequence state: {}", e);
-                return;
+                let msg = format!("Failed to acquire read lock for sequence state: {}", e);
+                debug!("{}", msg);
+                return Err(msg);
             }
         };

Remember to return Ok(()) at the end and update call sites. I can scan and patch the call sites if helpful.

302-307: Replace event-type magic numbers with the named constants.

-            let field_type = match event_type {
-                1 => StateUpdateFieldType::BatchMerkleTreeEventAppend,
-                2 => StateUpdateFieldType::BatchMerkleTreeEventNullify,
-                3 => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
+            let field_type = match event_type {
+                EVENT_TYPE_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAppend,
+                EVENT_TYPE_NULLIFY => StateUpdateFieldType::BatchMerkleTreeEventNullify,
+                EVENT_TYPE_ADDRESS_APPEND => StateUpdateFieldType::BatchMerkleTreeEventAddressAppend,
                 _ => continue,
             };

411-418: Use the same named constants in merkle_event_to_type_id.

 fn merkle_event_to_type_id(event: &MerkleTreeEvent) -> u8 {
     match event {
-        MerkleTreeEvent::BatchAppend(_) => 1,
-        MerkleTreeEvent::BatchNullify(_) => 2,
-        MerkleTreeEvent::BatchAddressAppend(_) => 3,
+        MerkleTreeEvent::BatchAppend(_) => EVENT_TYPE_APPEND,
+        MerkleTreeEvent::BatchNullify(_) => EVENT_TYPE_NULLIFY,
+        MerkleTreeEvent::BatchAddressAppend(_) => EVENT_TYPE_ADDRESS_APPEND,
         _ => 0, // Other event types we don't care about
     }
 }

246-331: Consider adding unit tests for boundary and regression cases.

Recommend tests for:

Single-entry block with a gap vs prior global state (regression for the early-return fix).
Duplicate sequences within a block (no gaps).
Non-monotonic sequences with real gaps.
Mixed event types per tree to ensure correct field_type attribution in gaps.

I can draft a minimal test harness around StateUpdateSequences::detect_all_sequence_gaps and detect_sequence_gaps_with_metadata if you’d like.

150-167: Handle StateV1 sequences for indexed_tree_seqs to avoid dropping updates.

Right now AddressV1 is the only handled variant; StateV1 updates (present for legacy trees) will log as “Unhandled …” and never update SEQUENCE_STATE.

                     match info.tree_type {
                         light_compressed_account::TreeType::AddressV1 => {
                             updates.insert(tree_str, TreeTypeSeq::AddressV1(max_entry.clone()));
                         }
+                        light_compressed_account::TreeType::StateV1 => {
+                            updates.insert(tree_str, TreeTypeSeq::StateV1(max_entry.clone()));
+                        }
                         tree_type => {

139-146: Fix TOCTOU race: compute and apply updates under a single write lock.

You read current_state under a read lock to derive V2 updates, drop it, then acquire a write lock to persist. Concurrent writers can interleave, causing lost updates or mismatched V2 input/output pairing. Take one write lock, use it for both reading and writing.

-    pub fn update_sequence_state(&self) {
-        let current_state = match SEQUENCE_STATE.read() {
-            Ok(state) => state,
-            Err(e) => {
-                debug!("Failed to acquire read lock for sequence state: {}", e);
-                return;
-            }
-        };
+    pub fn update_sequence_state(&self) {
+        let mut state = match SEQUENCE_STATE.write() {
+            Ok(state) => state,
+            Err(e) => {
+                debug!("Failed to acquire write lock for sequence state: {}", e);
+                return;
+            }
+        };
@@
-                updates.insert(
-                    tree_str.clone(),
-                    TreeTypeSeq::new_address_v2_with_output(
-                        current_state.get(&tree_str),
-                        max_entry.clone(),
-                    ),
-                );
+                updates.insert(
+                    tree_str.clone(),
+                    TreeTypeSeq::new_address_v2_with_output(
+                        state.get(&tree_str),
+                        max_entry.clone(),
+                    ),
+                );
@@
-                            updates.insert(
-                                tree_str.clone(),
-                                TreeTypeSeq::new_state_v2_with_output(
-                                    current_state.get(&tree_str),
-                                    max_entry.clone(),
-                                ),
-                            );
+                            updates.insert(
+                                tree_str.clone(),
+                                TreeTypeSeq::new_state_v2_with_output(
+                                    state.get(&tree_str),
+                                    max_entry.clone(),
+                                ),
+                            );
@@
-        // Drop read lock before acquiring write lock
-        drop(current_state);
-
-        // Apply all updates atomically
-        if !updates.is_empty() {
-            match SEQUENCE_STATE.write() {
-                Ok(mut state) => {
-                    for (key, value) in updates {
-                        state.insert(key, value);
-                    }
-                }
-                Err(e) => {
-                    debug!("Failed to acquire write lock for sequence state: {}", e);
-                }
-            }
-        }
+        // Apply all updates atomically under the same write lock
+        for (key, value) in updates {
+            state.insert(key, value);
+        }

If you adopt the Result-returning signature above, propagate errors accordingly. This change also reduces lock churn.

Also applies to: 188-194, 205-211, 228-243

333-346: Bug: early-return skips gap detection vs previous state when there’s a single sequence.

We still need to check “global state → first entry” even if there’s only one entry.

-        if sequences.len() < 2 {
+        if sequences.is_empty() {
             return Vec::new();
         }

🧹 Nitpick comments (17)

src/api/method/get_queue_elements.rs (5)
75-83: Consider applying the “not-nullified” filter even when start_queue_index is None

Currently, InputStateV2 applies NullifiedInTree == false only when start_queue_index is provided. If the intended API contract is “return only not-yet-nullified inputs,” you may want that filter in both branches for consistency and to avoid surprising callers who omit start_queue_index.

For example:
// inside QueueType::InputStateV2 branch
query_condition = query_condition
    .add(accounts::Column::NullifierQueueIndex.is_not_null())
    .add(accounts::Column::NullifiedInTree.ne(true));

if let Some(start_queue_index) = request.start_queue_index {
    query_condition = query_condition
        .add(accounts::Column::NullifierQueueIndex.gte(start_queue_index as i64));
}
If the asymmetry is intentional, please document that start_queue_index toggles the “exclude already-nullified” behavior in the endpoint docs.

99-106: Limit SELECT columns to reduce I/O

Both queries currently perform a SELECT * (SeaORM default) and then map into QueueElement with four fields. Narrow the projection to only those columns to cut row size and memory, especially under large limits.

Example:
let base = accounts::Entity::find()
    .select_only()
    .column(accounts::Column::LeafIndex)
    .column(accounts::Column::Hash)
    .column(accounts::Column::TxHash)
    .column(accounts::Column::NullifierQueueIndex)
    .filter(query_condition);

let query = match queue_type {
    QueueType::InputStateV2 => base.order_by_asc(accounts::Column::NullifierQueueIndex),
    QueueType::OutputStateV2 => base.order_by_asc(accounts::Column::LeafIndex),
    _ => unreachable!(),
};
80-81: Confirm inclusive range semantics (>=) match client expectations

Using gte(start_queue_index) makes the range inclusive. If clients pass their last-seen index, they will re-receive that element; if they pass “next index,” they won’t. Please confirm and document the intended contract to avoid off-by-one pagination issues.

114-121: Add targeted error context for DB failures

Current error wraps the DB error string. Adding key parameters (tree, queue_type, start_queue_index, limit) will speed up on-call debugging without leaking sensitive data.

Example:
.map_err(|e| {
    PhotonApiError::UnexpectedError(format!(
        "DB error fetching queue elements: tree={}, queue_type={:?}, start_queue_index={:?}, limit={}, err={}",
        hex::encode(request.tree.0),
        queue_type,
        request.start_queue_index,
        limit,
        e
    ))
})?
100-106: Indexing advisory for sustained throughput

Given the new predicate pattern, consider (or verify existence of) a composite index to support the common filter/sort path:

For InputStateV2: (tree, nullified_in_tree, nullifier_queue_index)

For OutputStateV2: (tree, in_output_queue, leaf_index)

This will improve selectivity and ordering.

Would you like a migration stub to add these indexes?
src/ingester/gap/sequences.rs (7)
1-11: Introduce named constants for Merkle event type IDs (used later).

Defining these once improves readability and avoids magic numbers used below in two places.
 use tracing::debug;
 
+// Merkle event type IDs
+const EVENT_TYPE_APPEND: u8 = 1;
+const EVENT_TYPE_NULLIFY: u8 = 2;
+const EVENT_TYPE_ADDRESS_APPEND: u8 = 3;
+
12-21: Key design: drop the unused type component from indexed_tree_seqs key.

You store (Pubkey, u64) but later ignore the type_id and derive the tree type via QUEUE_TREE_MAPPING. This can create duplicate buckets per tree and redundant work.
-    indexed_tree_seqs: HashMap<(Pubkey, u64), Vec<SequenceEntry>>, // (tree, tree_type_id) -> entries
+    indexed_tree_seqs: HashMap<Pubkey, Vec<SequenceEntry>>, // tree -> entries
If you adopt this, adjust:

Lines 32–41 to use .entry(*tree_pubkey).

Lines 251–264 to iterate over (tree_pubkey, seqs) (drop tree_type_id).

Lines 150–167 match logic remains the same since it already ignores the second tuple element.
Would you like me to provide a follow-up patch that touches all affected spots?

32-41: Align insert key with the proposed key simplification (if applied).
-        for ((tree_pubkey, _), leaf_update) in &state_update.indexed_merkle_tree_updates {
-            self.indexed_tree_seqs
-                .entry((*tree_pubkey, leaf_update.tree_type as u64))
+        for ((tree_pubkey, _), leaf_update) in &state_update.indexed_merkle_tree_updates {
+            self.indexed_tree_seqs
+                .entry(*tree_pubkey)
                 .or_insert_with(Vec::new)
                 .push(SequenceEntry {
                     sequence: leaf_update.seq,
                     slot,
                     signature: signature.to_string(),
                 });
         }
55-66: Prefer direct conversion into Pubkey (avoid to_bytes round-trip).

If context.tree_pubkey implements Into or From<...> for Pubkey, use it to avoid an unnecessary copy and to be explicit.
-            let tree = Pubkey::new_from_array(context.tree_pubkey.to_bytes());
+            let tree = Pubkey::from(context.tree_pubkey);
If From is not implemented, leave as-is.

348-371: Optional: avoid O(n*m) membership checks when scanning for “missing sequences.”

Current code scans the vector for each missing_seq. Convert to a HashSet once, then do O(1) lookups.

Pseudo-change:

Build let present: HashSet = sorted_sequences.iter().map(|e| e.sequence).collect();

Replace the inner any() loop with !present.contains(&missing_seq).

This keeps behavior identical and improves worst-case performance on larger blocks.

251-264: If you simplify the indexed_tree_seqs key, adjust this loop accordingly.
-        for ((tree_pubkey, tree_type_id), seqs) in &self.indexed_tree_seqs {
+        for (tree_pubkey, seqs) in &self.indexed_tree_seqs {
             debug!(
-                "Processing indexed_tree_seqs - tree: {}, tree_type_id: {}",
-                tree_pubkey, tree_type_id
+                "Processing indexed_tree_seqs - tree: {}",
+                tree_pubkey
             );
104-120: Optional: log unrecognized Merkle events (type_id == 0) at debug to aid troubleshooting.

Right now they’re silently skipped; adding a debug line once per block/tree can help during incident triage without being noisy.
src/ingester/persist/persisted_batch_event.rs (5)
21-45: Boolean return from validate_batch_index encourages early returns that break partial re-indexing.

The helper conflates validation with control flow. Returning Ok(false) on old_next_index < current_index tempts callers to short-circuit, which prevents processing the remaining [current_index, new_next_index) segment after a crash/partial commit.

Consider returning an adjusted start index instead of a boolean. For example:
- fn validate_batch_index(old_next_index: u64, current_index: u64, event_type: &str) -> Result<bool, IngesterError> {
-   if old_next_index > current_index { return Err(...); }
-   else if old_next_index < current_index { tracing::debug!(...); return Ok(false); }
-   Ok(true)
- }
+ /// Returns the clamped start index to process (max(old_next_index, current_index)).
+ /// Err if old_next_index > current_index.
+ fn clamp_start_index(old_next_index: u64, current_index: u64, event_type: &str) -> Result<u64, IngesterError> {
+     if old_next_index > current_index {
+         return Err(IngesterError::ParserError(format!(
+             "Batch {} old_next_index {} is greater than current index {}",
+             event_type, old_next_index, current_index
+         )));
+     }
+     let start = old_next_index.max(current_index);
+     if start > old_next_index {
+         tracing::debug!("Batch {} re-indexing detected: old_next_index {} < current_index {}", event_type, old_next_index, current_index);
+     }
+     Ok(start)
+ }
Even if you don’t adopt this helper, please ensure callers never “early-return” on partial re-indexing (see follow-up comments).

75-83: Avoid no-op DB calls when there are no leaf nodes.

For address-append events, leaf_nodes remains empty and we still call persist_leaf_nodes. Guard to skip the call when empty.
-            if leaf_nodes.len() <= MAX_SQL_INSERTS {
-                persist_leaf_nodes(txn, leaf_nodes, STATE_TREE_HEIGHT_V2 + 1).await?;
-            } else {
-                // Currently not used but a safeguard in case the batch size changes.
-                for leaf_nodes_chunk in leaf_nodes.chunks(MAX_SQL_INSERTS) {
-                    persist_leaf_nodes(txn, leaf_nodes_chunk.to_vec(), STATE_TREE_HEIGHT_V2 + 1)
-                        .await?;
-                }
-            }
+            if !leaf_nodes.is_empty() {
+                if leaf_nodes.len() <= MAX_SQL_INSERTS {
+                    persist_leaf_nodes(txn, leaf_nodes, STATE_TREE_HEIGHT_V2 + 1).await?;
+                } else {
+                    // Currently not used but a safeguard in case the batch size changes.
+                    for leaf_nodes_chunk in leaf_nodes.chunks(MAX_SQL_INSERTS) {
+                        persist_leaf_nodes(txn, leaf_nodes_chunk.to_vec(), STATE_TREE_HEIGHT_V2 + 1)
+                            .await?;
+                    }
+                }
+            }
217-221: Docstring is inconsistent with implementation for nullify path.

The code intentionally retains NullifierQueueIndex for re-indexing, but the comment still says “remove them from the database nullifier queue.”
-/// 2. Mark elements as nullified in tree
-///     and remove them from the database nullifier queue.
+/// 2. Mark elements as nullified in tree.
+///    We intentionally keep NullifierQueueIndex to support re-indexing.
248-261: Optional: filter already-nullified to reduce duplicate work.

Selecting only accounts where NullifiedInTree = false avoids re-pushing identical leaf nodes on replays. The subsequent update is idempotent, so this is a minor optimization.
         .filter(
             accounts::Column::NullifierQueueIndex
                 .gte(queue_start)
                 .and(accounts::Column::NullifierQueueIndex.lt(queue_end))
                 .and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec()))
-                .and(accounts::Column::Spent.eq(true)),
+                .and(accounts::Column::Spent.eq(true))
+                .and(accounts::Column::NullifiedInTree.eq(false)),
         )
380-385: Reduce log noise: downgrade “already processed” to debug (consistent with append path).

append uses tracing::debug for the same scenario. Matching levels helps keep INFO clean.
-            tracing::info!(
+            tracing::debug!(
                 "Address batch already processed: {} addresses already in indexed tree",
                 already_indexed
             );

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7823882 and c0beb65.

📒 Files selected for processing (5)

src/api/method/get_queue_elements.rs (1 hunks)
src/ingester/gap/mod.rs (1 hunks)
src/ingester/gap/sequences.rs (1 hunks)
src/ingester/persist/persisted_batch_event.rs (8 hunks)
src/snapshot/mod.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/ingester/gap/mod.rs
src/snapshot/mod.rs

🧰 Additional context used

🧬 Code graph analysis (2)

src/ingester/persist/persisted_batch_event.rs (1)

src/common/typedefs/account/context.rs (1)

try_from (109-152)

src/ingester/gap/sequences.rs (2)

src/ingester/gap/mod.rs (1)

get_current_sequence_state (70-110)

src/ingester/gap/treetype_seq.rs (2)

new_address_v2_with_output (61-67)

new_state_v2_with_output (51-58)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (10)

src/api/method/get_queue_elements.rs (3)

125-133: Good guard on missing queue index

The .ok_or(..)? when deriving first_value_queue_index for InputStateV2 provides a clear error path if invariants are violated. This pairs well with the is_not_null() condition earlier.

71-78: Transaction and isolation level: LGTM

Opening a read-only transaction and using REPEATABLE READ on Postgres is appropriate to ensure a consistent snapshot across the query and proof generation steps.

79-82: NullifiedInTree is Defined NOT NULL—.eq(false) Is Safe
Verified via migrations and generated code that nullified_in_tree is declared BOOLEAN NOT NULL DEFAULT FALSE (m20250206_000007_init.rs) and mapped as bool (not Option<bool>) in both common/typedefs/account/context.rs and dao/generated/accounts.rs. Since the database enforces non-null values, .eq(false) and .ne(true) yield identical results, and there are no indexes on nullified_in_tree that would be affected by this predicate.

• Migration guarantee: nullified_in_tree BOOLEAN NOT NULL DEFAULT FALSE
• Codegen guarantee: pub nullified_in_tree: bool in generated models
• Index check: no index on nullified_in_tree, so predicate semantics/performance unchanged

Given this, the suggested change is purely stylistic and can be safely skipped.

src/ingester/gap/sequences.rs (2)

76-92: Clarify behavior when QUEUE_TREE_MAPPING lacks the tree.

Currently you validate only when it’s Some(info) and silently accept otherwise. If a missing mapping is unexpected, consider logging warn! to help spot data/config drift; if it’s expected during backfills, current behavior is fine. Please confirm the intended policy.

171-177: Confirm: mapping nullification sequences to StateV1 is always correct.

If nullifications also occur for V2 state (or other trees), this hard-coded StateV1 assignment might misclassify the state tracking. If V2 nullifications never show up here, current mapping is fine.

src/ingester/persist/persisted_batch_event.rs (5)

15-17: SeaORM imports are correct and necessary.

Adding PaginatorTrait matches the new .count() usage; ConnectionTrait, ColumnTrait, QueryFilter/Order/Trait are all appropriate. No concerns.

236-247: Query constraints look sound.

Filtering nullify by queue range, tree, and spent=true with ascending order is correct and matches the invariants implied by the events. Nice.

201-214: Update is scoped correctly to the processed range and tree.

The update_many on InOutputQueue=false is precise and transactional. With the earlier clamp fix, this will be fully idempotent.

320-333: Correctly accounts for address tree’s pre-initialized zeroth element.

Using unwrap_or(1) for current_next_index acknowledges the sentinel at index 0. Looks good.

58-69: Event batching/persistence flow is clear and ordered.

Per-tree sequencing and per-event buffering of leaf nodes is straightforward and easy to follow. Nice use of capacity hint.

coderabbitai · 2025-08-26T02:13:44Z

src/ingester/gap/sequences.rs

+    pub fn update_sequence_state(&self) {
+        let current_state = match SEQUENCE_STATE.read() {
+            Ok(state) => state,
+            Err(e) => {
+                debug!("Failed to acquire read lock for sequence state: {}", e);
+                return;
+            }
+        };


💡 Verification agent

🧩 Analysis chain

Helper: find call sites to update if you change update_sequence_state() to return Result.

🏁 Script executed:

#!/bin/bash # Locate all call sites of update_sequence_state and show surrounding context. rg -n -C3 '\.update_sequence_state\s*\(' --type rust

Length of output: 374

🏁 Script executed:

#!/bin/bash # Locate all call sites of update_sequence_state (broader search) rg -n -C3 'update_sequence_state\s*\(' .

Length of output: 871

Propagate errors from update_sequence_state() in the caller

If you change update_sequence_state() to return a Result, its callers must be updated to handle or propagate errors. I found one usage:

• src/ingester/mod.rs:80

// before sequences.update_sequence_state(); // after (example using `?` to propagate) sequences.update_sequence_state()?;

Ensure you wrap this call in a match or use the ? operator so compilation and error handling remain correct.

🤖 Prompt for AI Agents

In src/ingester/gap/sequences.rs around lines 139-146, change update_sequence_state() to return a Result (e.g., Result<(), YourErrorType>) instead of returning unit on lock errors, convert the read lock Err branch to return Err(mapping the lock error to YourErrorType) and keep the Ok path returning Ok(()); then update all callers (notably src/ingester/mod.rs around line 80) to handle the Result by propagating with ? or matching and handling the Err case so compilation and error handling remain correct.

coderabbitai · 2025-08-26T02:13:44Z

src/ingester/persist/persisted_batch_event.rs

+    let expected_count =
+        (batch_append_event.new_next_index - batch_append_event.old_next_index) as usize;
+
+    // Validate old_next_index matches the current state of the tree
+    let current_next_index = accounts::Entity::find()
+        .filter(
+            accounts::Column::Tree
+                .eq(batch_append_event.merkle_tree_pubkey.to_vec())
+                .and(accounts::Column::InOutputQueue.eq(false)),
+        )
+        .order_by_desc(accounts::Column::LeafIndex)
+        .one(txn)
+        .await?
+        .map(|acc| (acc.leaf_index + 1) as u64)
+        .unwrap_or(0);
+
+    if !validate_batch_index(
+        batch_append_event.old_next_index,
+        current_next_index,
+        "append",
+    )? {
+        return Ok(());
+    }
+


⚠️ Potential issue

Partial re-indexing bug: append path returns early and skips [current_next_index, new_next_index).

If the batch was partially applied (old_next_index < current_next_index < new_next_index), the early return at Lines 113–120 drops the remaining work permanently. We should clamp the start to max(old_next_index, current_next_index) and continue.

Apply the following minimal fix to clamp the range and avoid short-circuiting:

@@ - let expected_count = - (batch_append_event.new_next_index - batch_append_event.old_next_index) as usize; + // Compute current expected count later after clamping start_index + let original_expected_count = + (batch_append_event.new_next_index - batch_append_event.old_next_index) as usize; @@ - if !validate_batch_index( - batch_append_event.old_next_index, - current_next_index, - "append", - )? { - return Ok(()); - } + // Validate and clamp the start index to support partial re-indexing + if batch_append_event.old_next_index > current_next_index { + return Err(IngesterError::ParserError(format!( + "Batch append old_next_index {} is greater than current index {}", + batch_append_event.old_next_index, current_next_index + ))); + } + let start_index = batch_append_event.old_next_index.max(current_next_index); + if start_index > batch_append_event.old_next_index { + tracing::debug!( + "Batch append re-indexing: clamped start from {} to {} (new_next_index={})", + batch_append_event.old_next_index, + start_index, + batch_append_event.new_next_index + ); + } + // Fully processed already + if start_index >= batch_append_event.new_next_index { + return Ok(()); + } + let expected_count = + (batch_append_event.new_next_index - start_index) as usize; @@ - accounts::Column::LeafIndex - .gte(batch_append_event.old_next_index as i64) - .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64)) + accounts::Column::LeafIndex + .gte(start_index as i64) + .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64)) .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec())) .and(accounts::Column::InOutputQueue.eq(true)), @@ - let mut expected_leaf_index = batch_append_event.old_next_index; + let mut expected_leaf_index = start_index; @@ - } else if accounts.is_empty() { + } else if accounts.is_empty() { // Check if already processed (re-indexing scenario) let already_processed = accounts::Entity::find() .filter( - accounts::Column::LeafIndex - .gte(batch_append_event.old_next_index as i64) - .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64)) + accounts::Column::LeafIndex + .gte(start_index as i64) + .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64)) .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec())) .and(accounts::Column::InOutputQueue.eq(false)), ) .count(txn) .await?; - if already_processed == expected_count as u64 { + if already_processed >= expected_count as u64 { tracing::debug!( "Batch append already processed: {} accounts already in tree for range [{}, {})", already_processed, - batch_append_event.old_next_index, + start_index, batch_append_event.new_next_index ); return Ok(()); } @@ - "Expected {} accounts in append batch, found 0 in queue, {} already processed", - expected_count, already_processed + "Expected {} accounts in append batch for range [{}, {}), found 0 in queue, {} already processed", + expected_count, start_index, batch_append_event.new_next_index, already_processed ))); } else { return Err(IngesterError::ParserError(format!( "Expected {} accounts in append batch, found {}", expected_count, accounts.len() ))); } @@ - .filter( - accounts::Column::LeafIndex - .gte(batch_append_event.old_next_index as i64) - .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64)) - .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec())), - ) + .filter( + accounts::Column::LeafIndex + .gte(start_index as i64) + .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64)) + .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec())), + )

This makes the append logic idempotent and resilient to partial application.

Also applies to: 121-132, 133-166, 167-199, 201-209

🤖 Prompt for AI Agents

In src/ingester/persist/persisted_batch_event.rs around lines 97 to 120, the append path returns early when validate_batch_index fails, which drops remaining indexes if a batch was partially applied; instead clamp the processing start to max(batch_append_event.old_next_index, current_next_index) and proceed without returning so the loop/process will cover [clamped_start, batch_append_event.new_next_index), making the append idempotent; apply the same clamp-and-continue fix to the other append-related ranges noted (lines ~121-132, 133-166, 167-199, 201-209).

src/ingester/persist/persisted_batch_event.rs

coderabbitai · 2025-08-26T02:13:44Z

src/ingester/persist/persisted_batch_event.rs

+    // Validate old_next_index matches the current state of the address tree
+    let current_next_index = indexed_trees::Entity::find()
+        .filter(
+            indexed_trees::Column::Tree.eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
+        )
+        .order_by_desc(indexed_trees::Column::LeafIndex)
+        .one(txn)
+        .await?
+        .map(|tree| (tree.leaf_index + 1) as u64)
+        .unwrap_or(1); // Address tree has zeroeth element
+
+    if !validate_batch_index(
+        batch_address_append_event.old_next_index,
+        current_next_index,
+        "address append",
+    )? {
+        return Ok(());
+    }
+
+    // Address queue indices are 0-based, but batch updates use 1-based indices
+    // (because address trees have a pre-initialized zeroth element)
+    // So we need to offset by -1 when querying the queue
+    let queue_start = (batch_address_append_event.old_next_index as i64) - 1;
+    let queue_end = (batch_address_append_event.new_next_index as i64) - 1;
+


⚠️ Potential issue

Partial re-indexing bug: address-append path also returns early and skips [current_next_index, new_next_index).

Same issue as the state-append path. Clamp the start index and proceed; also align the “already indexed” check and queue deletes with the clamped range.

@@ - if !validate_batch_index( - batch_address_append_event.old_next_index, - current_next_index, - "address append", - )? { - return Ok(()); - } + // Validate and clamp start for partial re-indexing + if batch_address_append_event.old_next_index > current_next_index { + return Err(IngesterError::ParserError(format!( + "Address append old_next_index {} is greater than current index {}", + batch_address_append_event.old_next_index, current_next_index + ))); + } + let start_index = batch_address_append_event + .old_next_index + .max(current_next_index); + if start_index >= batch_address_append_event.new_next_index { + // Fully processed already + return Ok(()); + } + let expected_count = (batch_address_append_event.new_next_index - start_index) as usize; @@ - let queue_start = (batch_address_append_event.old_next_index as i64) - 1; - let queue_end = (batch_address_append_event.new_next_index as i64) - 1; + let queue_start = (start_index as i64) - 1; + let queue_end = (batch_address_append_event.new_next_index as i64) - 1; @@ - let already_indexed = indexed_trees::Entity::find() + let already_indexed = indexed_trees::Entity::find() .filter( indexed_trees::Column::Tree .eq(batch_address_append_event.merkle_tree_pubkey.to_vec()) .and( indexed_trees::Column::LeafIndex - .gte(batch_address_append_event.old_next_index as i64), + .gte(start_index as i64), ) .and( indexed_trees::Column::LeafIndex .lt(batch_address_append_event.new_next_index as i64), ), ) .count(txn) .await?; - if already_indexed >= expected_count as u64 { - tracing::info!( + if already_indexed >= expected_count as u64 { + tracing::debug!( "Address batch already processed: {} addresses already in indexed tree", already_indexed ); return Ok(()); } @@ - address_queues::Entity::delete_many() + address_queues::Entity::delete_many() .filter( address_queues::Column::QueueIndex .gte(queue_start) .and(address_queues::Column::QueueIndex.lt(queue_end)) .and( address_queues::Column::Tree .eq(batch_address_append_event.merkle_tree_pubkey.to_vec()), ), )

Also applies to: 348-361, 362-398, 434-444

🤖 Prompt for AI Agents

In src/ingester/persist/persisted_batch_event.rs around lines 323-347 (and also apply same changes at 348-361, 362-398, 434-444): the address-append path currently returns early when the old_next_index indicates already-indexed range instead of clamping the start and continuing, which causes partial re-indexing to skip [current_next_index, new_next_index). Change the logic to clamp queue_start to current_next_index (or equivalent clamped 0-based start) rather than returning; then use the clamped range for the "already indexed" check, for querying the queue, and for any queue delete operations so they operate on the clamped interval only; ensure indices are converted consistently between 1-based batch indices and 0-based queue indices when applying the clamp and deletes.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/ingester/persist/persisted_batch_event.rs (1)

422-431: Use overflow-safe seq conversion for address multi_append.

Same rationale as nullify path.

-    multi_append(
+    let seq_u32 = u32::try_from(batch_address_append_event.sequence_number)
+        .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?;
+    multi_append(
         txn,
         address_values,
         batch_address_append_event.merkle_tree_pubkey.to_vec(),
         DEFAULT_BATCH_ADDRESS_TREE_HEIGHT + 1,
-        Some(batch_address_append_event.sequence_number as u32),
+        Some(seq_u32),
     )

♻️ Duplicate comments (4)

src/ingester/persist/persisted_batch_event.rs (4)

199-209: Scope the queue cleanup to the clamped range and to rows currently in the queue.

Without clamping and the InOutputQueue=true predicate, we can rewrite already-processed rows or miss partial ranges.

Apply:

-    let query = accounts::Entity::update_many()
+    let query = accounts::Entity::update_many()
         .col_expr(accounts::Column::InOutputQueue, Expr::value(false))
         .filter(
-            accounts::Column::LeafIndex
-                .gte(batch_append_event.old_next_index as i64)
-                .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
-                .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec())),
+            accounts::Column::LeafIndex
+                .gte(start_index as i64)
+                .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
+                .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec()))
+                .and(accounts::Column::InOutputQueue.eq(true)),
         )
         .build(txn.get_database_backend());

281-293: Use overflow-safe conversion for seq and leaf_index when building nullify leaf nodes.

Direct casts with as u32 may silently truncate. Convert via TryFrom and bubble a ParserError on overflow.

-            seq: Some(batch_nullify_event.sequence_number as u32),
-            leaf_index: account.leaf_index as u32,
+            seq: Some(
+                u32::try_from(batch_nullify_event.sequence_number)
+                    .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?
+            ),
+            leaf_index: u32::try_from(account.leaf_index)
+                .map_err(|_| IngesterError::ParserError("leaf_index overflow".to_string()))?,

321-345: Address-append: compute a clamped start_index; fix 1-based to 0-based queue offsets accordingly.

Currently, partial re-indexing still risks skipping [current_next_index, new_next_index). Clamp using validate_batch_index, then derive queue_start/queue_end from the clamped start.

-    let current_next_index = indexed_trees::Entity::find()
+    let current_next_index = indexed_trees::Entity::find()
         .filter(
             indexed_trees::Column::Tree.eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
         )
         .order_by_desc(indexed_trees::Column::LeafIndex)
         .one(txn)
         .await?
-        .map(|tree| (tree.leaf_index + 1) as u64)
-        .unwrap_or(1); // Address tree has zeroeth element
+        .map(|tree| (tree.leaf_index + 1) as u64)
+        .unwrap_or(1); // Address tree has zeroth element, so first valid old_next_index is 1
@@
-    // Address queue indices are 0-based, but batch updates use 1-based indices
-    // (because address trees have a pre-initialized zeroth element)
-    // So we need to offset by -1 when querying the queue
-    let queue_start = (batch_address_append_event.old_next_index as i64) - 1;
-    let queue_end = (batch_address_append_event.new_next_index as i64) - 1;
+    // Clamp start_index to support partial re-indexing (1-based indices)
+    let start_index = validate_batch_index(
+        batch_address_append_event.old_next_index,
+        current_next_index,
+        "address append",
+    )?;
+    if start_index >= batch_address_append_event.new_next_index {
+        return Ok(());
+    }
+    // Convert clamped 1-based [start, end) to 0-based queue indices
+    let queue_start = (start_index as i64) - 1;
+    let queue_end = (batch_address_append_event.new_next_index as i64) - 1;

346-355: Address-append: apply clamped queue range consistently to read and delete; avoid over/under-deletes.

Use the clamped queue_start/queue_end in both select and delete filters.

-    let addresses = address_queues::Entity::find()
+    let addresses = address_queues::Entity::find()
         .filter(
             address_queues::Column::QueueIndex
                 .gte(queue_start)
                 .and(address_queues::Column::QueueIndex.lt(queue_end))
                 .and(
                     address_queues::Column::Tree
                         .eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
                 ),
         )
@@
-    address_queues::Entity::delete_many()
+    address_queues::Entity::delete_many()
         .filter(
             address_queues::Column::QueueIndex
                 .gte(queue_start)
                 .and(address_queues::Column::QueueIndex.lt(queue_end))
                 .and(
                     address_queues::Column::Tree
                         .eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
                 ),
         )

Also applies to: 432-444

🧹 Nitpick comments (4)

src/ingester/persist/persisted_batch_event.rs (4)
398-421: Optional: validate queue contiguity and expected counts for addresses to catch gaps early.

Even with clamping, silent gaps are possible if upstream skipped enqueues. Consider re-enabling the expected_count/contiguity checks to detect and rewind.

I can wire back the already_indexed fast-path and expected_count checks with concise tracing if helpful.

63-73: Avoid persisting empty leaf_batches across events.

persist_leaf_nodes is called unconditionally later. If BatchAppend yields zero nodes (current code path), this may attempt an empty insert. Guard and continue.

Apply outside of this block (in persist_batch_events loop):
-            if leaf_nodes.len() <= MAX_SQL_INSERTS {
+            if leaf_nodes.is_empty() {
+                continue;
+            }
+            if leaf_nodes.len() <= MAX_SQL_INSERTS {
                 persist_leaf_nodes(txn, leaf_nodes, STATE_TREE_HEIGHT_V2 + 1).await?;
15-17: Pagination/streaming can bound memory for large batches.

You imported PaginatorTrait; consider paginating accounts/address_queues instead of loading entire ranges into memory.

Sketch:
let mut paginator = accounts::Entity::find()
    .filter(/* clamped range */)
    .order_by_asc(accounts::Column::LeafIndex)
    .paginate(txn, MAX_SQL_INSERTS as u64);

while let Some(chunk) = paginator.fetch_and_next().await? {
    // build leaf_nodes per chunk and persist incrementally
}
321-331: Double-check default current_next_index=1 for new address trees.

If the indexed_trees table can be empty, starting at 1 is correct only if the zeroth leaf is always pre-initialized. If not guaranteed, default should be 0 and then convert to 1-based later.

I can search for the initialization of the zeroth address leaf to confirm. Shall I?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c0beb65 and a0cfb50.

📒 Files selected for processing (1)

src/ingester/persist/persisted_batch_event.rs (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/ingester/persist/persisted_batch_event.rs (2)

src/common/typedefs/account/context.rs (2)

try_from (109-152)

new (53-103)

src/common/typedefs/account/v2.rs (1)

try_from (40-78)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (1)

src/ingester/persist/persisted_batch_event.rs (1)

224-233: The above will show whether and how detect_sequence_gaps_with_metadata enforces contiguous queue ranges for BatchNullifyContext. Once we see its implementation and any error handling, we can decide if the expected_count check in persist_batch_nullify_event remains necessary as defense-in-depth.

coderabbitai · 2025-08-26T06:37:09Z

src/ingester/persist/persisted_batch_event.rs

+/// Validates that the old_next_index in a batch event matches the current state.
+/// Returns Ok(true) if processing should continue, Ok(false) if already processed (re-indexing),
+/// or Err if validation fails.
+fn validate_batch_index(
+    old_next_index: u64,
+    current_index: u64,
+    event_type: &str,
+) -> Result<bool, IngesterError> {
+    if old_next_index > current_index {
+        return Err(IngesterError::ParserError(format!(
+            "Batch {} old_next_index {} is greater than current index {}",
+            event_type, old_next_index, current_index
+        )));
+    } else if old_next_index < current_index {
+        // Re-indexing scenario - events already processed
+        tracing::debug!(
+            "Batch {} re-indexing detected: old_next_index {} < current_index {}",
+            event_type,
+            old_next_index,
+            current_index
+        );
+        return Ok(false);
+    }
+    Ok(true)
+}


🛠️ Refactor suggestion

Refactor validator to return a clamped start index (enables safe partial re-indexing).

Returning bool forces call sites to branch/early-return and is the root cause of skipped ranges. Have the helper return the clamped start_index (max(old_next_index, current_index)) or error. This also unifies append/address-append flows.

Apply:

-/// Validates that the old_next_index in a batch event matches the current state. -/// Returns Ok(true) if processing should continue, Ok(false) if already processed (re-indexing), -/// or Err if validation fails. -fn validate_batch_index( - old_next_index: u64, - current_index: u64, - event_type: &str, -) -> Result<bool, IngesterError> { - if old_next_index > current_index { - return Err(IngesterError::ParserError(format!( - "Batch {} old_next_index {} is greater than current index {}", - event_type, old_next_index, current_index - ))); - } else if old_next_index < current_index { - // Re-indexing scenario - events already processed - tracing::debug!( - "Batch {} re-indexing detected: old_next_index {} < current_index {}", - event_type, - old_next_index, - current_index - ); - return Ok(false); - } - Ok(true) -} +/// Returns the clamped start index to process (max(old_next_index, current_index)). +/// Errors when old_next_index > current_index (future index). +fn validate_batch_index( + old_next_index: u64, + current_index: u64, + event_type: &str, +) -> Result<u64, IngesterError> { + if old_next_index > current_index { + return Err(IngesterError::ParserError(format!( + "Batch {} old_next_index {} is greater than current index {}", + event_type, old_next_index, current_index + ))); + } + let start_index = old_next_index.max(current_index); + if start_index > old_next_index { + tracing::debug!( + "Batch {} re-indexing: clamped start from {} to {}", + event_type, + old_next_index, + start_index + ); + } + Ok(start_index) +}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/ingester/persist/persisted_batch_event.rs around lines 21–45, change validate_batch_index to return Result<u64, IngesterError> instead of Result<bool, IngesterError> and have it produce a clamped start index; specifically keep the existing error when old_next_index > current_index, otherwise return Ok(std::cmp::max(old_next_index, current_index)) so callers get the safe start index to use (no branching/early-return for re-indexing); update all call sites to use the returned u64 as the start index for processing/append flows.

src/ingester/persist/persisted_batch_event.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

src/ingester/persist/persisted_batch_event.rs (5)

199-209: Align queue cleanup to the clamped start_index.

If you apply the clamped start fix, the cleanup should also use [start_index, new_next_index) to avoid touching earlier indices and to keep idempotency consistent.

See the tail of the previous diff where gte(old_next_index) → gte(start_index).

21-45: Return a clamped start index instead of bool (prevents partial re-indexing skips).

The current validator forces call sites to branch/early-return, which is why partially applied batches end up skipping [current_next_index, new_next_index). Return the clamped start index (max(old_next_index, current_index)) or error. Callers can then process the remaining range idempotently.

Apply:

-/// Validates that the old_next_index in a batch event matches the current state.
-/// Returns Ok(true) if processing should continue, Ok(false) if already processed (re-indexing),
-/// or Err if validation fails.
-fn validate_batch_index(
-    old_next_index: u64,
-    current_index: u64,
-    event_type: &str,
-) -> Result<bool, IngesterError> {
-    if old_next_index > current_index {
-        return Err(IngesterError::ParserError(format!(
-            "Batch {} old_next_index {} is greater than current index {}",
-            event_type, old_next_index, current_index
-        )));
-    } else if old_next_index < current_index {
-        // Re-indexing scenario - events already processed
-        tracing::debug!(
-            "Batch {} re-indexing detected: old_next_index {} < current_index {}",
-            event_type,
-            old_next_index,
-            current_index
-        );
-        return Ok(false);
-    }
-    Ok(true)
-}
+/// Returns the clamped start index to process (max(old_next_index, current_index)).
+/// Errors when old_next_index > current_index (future index).
+fn validate_batch_index(
+    old_next_index: u64,
+    current_index: u64,
+    event_type: &str,
+) -> Result<u64, IngesterError> {
+    if old_next_index > current_index {
+        return Err(IngesterError::ParserError(format!(
+            "Batch {} old_next_index {} is greater than current index {}",
+            event_type, old_next_index, current_index
+        )));
+    }
+    let start_index = old_next_index.max(current_index);
+    if start_index > old_next_index {
+        tracing::debug!(
+            "Batch {} re-indexing: clamped start from {} to {}",
+            event_type,
+            old_next_index,
+            start_index
+        );
+    }
+    Ok(start_index)
+}

97-197: Append path is a no-op: rebuild leaf nodes, enforce gap checks, and clamp start for idempotency.

With the core logic commented out, BatchAppend generates no leaf nodes and silently flips InOutputQueue=false. That drops data and neuters gap detection.

Apply the minimal, self-contained fix (uses clamped start; performs reindex-aware checks; safe integer conversions):

 async fn persist_batch_append_event(
     txn: &DatabaseTransaction,
     batch_append_event: &BatchEvent,
     leaf_nodes: &mut Vec<LeafNode>,
 ) -> Result<(), IngesterError> {
-    // let expected_count =
-    //     (batch_append_event.new_next_index - batch_append_event.old_next_index) as usize;
-    // // Validate old_next_index matches the current state of the tree
-    // let current_next_index = accounts::Entity::find()
-    //     .filter(
-    //         accounts::Column::Tree
-    //             .eq(batch_append_event.merkle_tree_pubkey.to_vec())
-    //             .and(accounts::Column::InOutputQueue.eq(false)),
-    //     )
-    //     .order_by_desc(accounts::Column::LeafIndex)
-    //     .one(txn)
-    //     .await?
-    //     .map(|acc| (acc.leaf_index + 1) as u64)
-    //     .unwrap_or(0);
-    // if !validate_batch_index(
-    //     batch_append_event.old_next_index,
-    //     current_next_index,
-    //     "append",
-    // )? {
-    //     return Ok(());
-    // }
+    // Derive clamped start to support partial re-indexing
+    let current_next_index = accounts::Entity::find()
+        .filter(
+            accounts::Column::Tree
+                .eq(batch_append_event.merkle_tree_pubkey.to_vec())
+                .and(accounts::Column::InOutputQueue.eq(false)),
+        )
+        .order_by_desc(accounts::Column::LeafIndex)
+        .one(txn)
+        .await?
+        .map(|acc| (acc.leaf_index + 1) as u64)
+        .unwrap_or(0);
+    // Use the refactored validator; if not applied yet, compute start as old/max manually.
+    let start_index = validate_batch_index(
+        batch_append_event.old_next_index,
+        current_next_index,
+        "append",
+    )?;
+    if start_index >= batch_append_event.new_next_index {
+        // Fully processed already
+        return Ok(());
+    }
+    let expected_count =
+        (batch_append_event.new_next_index - start_index) as usize;
 
-    // let accounts = accounts::Entity::find()
-    //     .filter(
-    //         accounts::Column::LeafIndex
-    //             .gte(batch_append_event.old_next_index as i64)
-    //             .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
-    //             .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec()))
-    //             .and(accounts::Column::InOutputQueue.eq(true)),
-    //     )
-    //     .order_by_asc(accounts::Column::LeafIndex)
-    //     .all(txn)
-    //     .await?;
+    let accounts = accounts::Entity::find()
+        .filter(
+            accounts::Column::LeafIndex
+                .gte(start_index as i64)
+                .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
+                .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec()))
+                .and(accounts::Column::InOutputQueue.eq(true)),
+        )
+        .order_by_asc(accounts::Column::LeafIndex)
+        .all(txn)
+        .await?;
 
-    // // If we got the expected count, proceed
-    // if accounts.len() == expected_count {
-    //     // Validate sequential indices and process accounts
-    //     let mut expected_leaf_index = batch_append_event.old_next_index;
-    //
-    //     for account in &accounts {
-    //         if account.leaf_index != expected_leaf_index as i64 {
-    //             return Err(IngesterError::ParserError(format!(
-    //                 "Gap in leaf indices: expected {}, got {}",
-    //                 expected_leaf_index, account.leaf_index
-    //             )));
-    //         }
-    //         expected_leaf_index += 1;
-    //
-    //         if account.hash.is_empty() {
-    //             return Err(IngesterError::ParserError(
-    //                 "Account hash is missing".to_string(),
-    //             ));
-    //         }
-    //
-    //         leaf_nodes.push(LeafNode {
-    //             tree: SerializablePubkey::try_from(account.tree.clone()).map_err(|_| {
-    //                 IngesterError::ParserError(
-    //                     "Failed to convert tree to SerializablePubkey".to_string(),
-    //                 )
-    //             })?,
-    //             seq: Some(batch_append_event.sequence_number as u32),
-    //             leaf_index: account.leaf_index as u32,
-    //             hash: Hash::new(account.hash.as_slice()).map_err(|_| {
-    //                 IngesterError::ParserError("Failed to convert account hash to Hash".to_string())
-    //             })?,
-    //         });
-    //     }
-    // } else if accounts.is_empty() {
-    //     // Check if already processed (re-indexing scenario)
-    //     let already_processed = accounts::Entity::find()
-    //         .filter(
-    //             accounts::Column::LeafIndex
-    //                 .gte(batch_append_event.old_next_index as i64)
-    //                 .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
-    //                 .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec()))
-    //                 .and(accounts::Column::InOutputQueue.eq(false)),
-    //         )
-    //         .count(txn)
-    //         .await?;
-    //
-    //     if already_processed == expected_count as u64 {
-    //         tracing::debug!(
-    //             "Batch append already processed: {} accounts already in tree for range [{}, {})",
-    //             already_processed,
-    //             batch_append_event.old_next_index,
-    //             batch_append_event.new_next_index
-    //         );
-    //         return Ok(());
-    //     }
-    //
-    //     return Err(IngesterError::ParserError(format!(
-    //         "Expected {} accounts in append batch, found 0 in queue, {} already processed",
-    //         expected_count, already_processed
-    //     )));
-    // } else {
-    //     return Err(IngesterError::ParserError(format!(
-    //         "Expected {} accounts in append batch, found {}",
-    //         expected_count,
-    //         accounts.len()
-    //     )));
-    // }
+    if accounts.is_empty() {
+        // Re-indexing check: range already applied?
+        let already_processed = accounts::Entity::find()
+            .filter(
+                accounts::Column::LeafIndex
+                    .gte(start_index as i64)
+                    .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
+                    .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec()))
+                    .and(accounts::Column::InOutputQueue.eq(false)),
+            )
+            .count(txn)
+            .await?;
+        if already_processed >= expected_count as u64 {
+            tracing::debug!(
+                "Batch append already processed: {} accounts already in tree for range [{}, {})",
+                already_processed, start_index, batch_append_event.new_next_index
+            );
+            return Ok(());
+        }
+        return Err(IngesterError::ParserError(format!(
+            "Expected {} accounts in append batch for range [{}, {}), found 0 in queue, {} already processed",
+            expected_count, start_index, batch_append_event.new_next_index, already_processed
+        )));
+    }
+    if accounts.len() != expected_count {
+        return Err(IngesterError::ParserError(format!(
+            "Expected {} accounts in append batch, found {}",
+            expected_count,
+            accounts.len()
+        )));
+    }
+    // Validate sequential indices and build leaf nodes
+    let mut expected_leaf_index = start_index;
+    for account in &accounts {
+        if account.leaf_index as u64 != expected_leaf_index {
+            return Err(IngesterError::ParserError(format!(
+                "Gap in leaf indices: expected {}, got {}",
+                expected_leaf_index, account.leaf_index
+            )));
+        }
+        expected_leaf_index += 1;
+        if account.hash.is_empty() {
+            return Err(IngesterError::ParserError(
+                "Account hash is missing".to_string(),
+            ));
+        }
+        let seq_u32 = u32::try_from(batch_append_event.sequence_number)
+            .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?;
+        let leaf_index_u32 = u32::try_from(account.leaf_index)
+            .map_err(|_| IngesterError::ParserError("leaf_index overflow".to_string()))?;
+        leaf_nodes.push(LeafNode {
+            tree: SerializablePubkey::try_from(account.tree.clone()).map_err(|_| {
+                IngesterError::ParserError("Failed to convert tree to SerializablePubkey".to_string())
+            })?,
+            seq: Some(seq_u32),
+            leaf_index: leaf_index_u32,
+            hash: Hash::new(account.hash.as_slice()).map_err(|_| {
+                IngesterError::ParserError("Failed to convert account hash to Hash".to_string())
+            })?,
+        });
+    }
 
     // 2. Remove inserted elements from the output queue.
     let query = accounts::Entity::update_many()
         .col_expr(accounts::Column::InOutputQueue, Expr::value(false))
         .filter(
             accounts::Column::LeafIndex
-                .gte(batch_append_event.old_next_index as i64)
-                .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
+                .gte(start_index as i64)
+                .and(accounts::Column::LeafIndex.lt(batch_append_event.new_next_index as i64))
                 .and(accounts::Column::Tree.eq(batch_append_event.merkle_tree_pubkey.to_vec()))
                 .and(accounts::Column::InOutputQueue.eq(true)),
         )
         .build(txn.get_database_backend());
 
     txn.execute(query).await?;
 
     Ok(())
 }

322-346: Address-append: clamp start, re-index safely, and fix sequence conversion.

Mirror the append-path fixes: derive a clamped start index, operate on the clamped queue range, check “already indexed” when the queue is empty, and avoid u64→u32 truncation for seq. Also compute 0-based queue indices from the clamped start.

Apply:

 async fn persist_batch_address_append_event(
     txn: &DatabaseTransaction,
     batch_address_append_event: &BatchEvent,
 ) -> Result<(), IngesterError> {
-    // let expected_count = (batch_address_append_event.new_next_index
-    //     - batch_address_append_event.old_next_index) as usize;
-
-    // Validate old_next_index matches the current state of the address tree
-    // let current_next_index = indexed_trees::Entity::find()
-    //     .filter(
-    //         indexed_trees::Column::Tree.eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
-    //     )
-    //     .order_by_desc(indexed_trees::Column::LeafIndex)
-    //     .one(txn)
-    //     .await?
-    //     .map(|tree| (tree.leaf_index + 1) as u64)
-    //     .unwrap_or(1); // Address tree has zeroeth element
-
-    // if !validate_batch_index(
-    //     batch_address_append_event.old_next_index,
-    //     current_next_index,
-    //     "address append",
-    // )? {
-    //     return Ok(());
-    // }
-
-    // Address queue indices are 0-based, but batch updates use 1-based indices
-    // (because address trees have a pre-initialized zeroth element)
-    // So we need to offset by -1 when querying the queue
-    let queue_start = (batch_address_append_event.old_next_index as i64) - 1;
-    let queue_end = (batch_address_append_event.new_next_index as i64) - 1;
+    // Determine clamped start for partial re-indexing:
+    // Address trees are 1-based (zeroth element pre-initialized).
+    let current_next_index = indexed_trees::Entity::find()
+        .filter(indexed_trees::Column::Tree.eq(batch_address_append_event.merkle_tree_pubkey.to_vec()))
+        .order_by_desc(indexed_trees::Column::LeafIndex)
+        .one(txn)
+        .await?
+        .map(|tree| (tree.leaf_index + 1) as u64)
+        .unwrap_or(1); // 1-based
+    let start_index = validate_batch_index(
+        batch_address_append_event.old_next_index,
+        current_next_index,
+        "address append",
+    )?;
+    if start_index >= batch_address_append_event.new_next_index {
+        return Ok(());
+    }
+    let expected_count = (batch_address_append_event.new_next_index - start_index) as usize;
+    // Address queue is 0-based; subtract 1 from the clamped 1-based start/end
+    let queue_start = i64::try_from(start_index).map_err(|_| IngesterError::ParserError("start_index overflow".to_string()))? - 1;
+    let queue_end = i64::try_from(batch_address_append_event.new_next_index)
+        .map_err(|_| IngesterError::ParserError("new_next_index overflow".to_string()))? - 1;
 
     let addresses = address_queues::Entity::find()
         .filter(
             address_queues::Column::QueueIndex
                 .gte(queue_start)
                 .and(address_queues::Column::QueueIndex.lt(queue_end))
                 .and(
                     address_queues::Column::Tree
                         .eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
                 ),
         )
         .order_by_asc(address_queues::Column::QueueIndex)
         .all(txn)
         .await?;
 
-    // if addresses.is_empty() {
-    //     // Check if already processed (re-indexing scenario)
-    //     let already_indexed = indexed_trees::Entity::find()
-    //         .filter(
-    //             indexed_trees::Column::Tree
-    //                 .eq(batch_address_append_event.merkle_tree_pubkey.to_vec())
-    //                 .and(
-    //                     indexed_trees::Column::LeafIndex
-    //                         .gte(batch_address_append_event.old_next_index as i64),
-    //                 )
-    //                 .and(
-    //                     indexed_trees::Column::LeafIndex
-    //                         .lt(batch_address_append_event.new_next_index as i64),
-    //                 ),
-    //         )
-    //         .count(txn)
-    //         .await?;
-    //
-    //     if already_indexed >= expected_count as u64 {
-    //         tracing::info!(
-    //             "Address batch already processed: {} addresses already in indexed tree",
-    //             already_indexed
-    //         );
-    //         return Ok(());
-    //     }
-    //
-    //     return Err(IngesterError::ParserError(format!(
-    //         "Expected {} addresses in address append batch, found 0 in queue",
-    //         expected_count
-    //     )));
-    // } else if addresses.len() != expected_count {
-    //     return Err(IngesterError::ParserError(format!(
-    //         "Expected {} addresses in address append batch, found {}",
-    //         expected_count,
-    //         addresses.len()
-    //     )));
-    // }
+    if addresses.is_empty() {
+        // Re-indexing check: confirm range applied to the indexed tree
+        let already_indexed = indexed_trees::Entity::find()
+            .filter(
+                indexed_trees::Column::Tree
+                    .eq(batch_address_append_event.merkle_tree_pubkey.to_vec())
+                    .and(indexed_trees::Column::LeafIndex.gte(start_index as i64))
+                    .and(indexed_trees::Column::LeafIndex.lt(batch_address_append_event.new_next_index as i64)),
+            )
+            .count(txn)
+            .await?;
+        if already_indexed >= expected_count as u64 {
+            tracing::debug!(
+                "Address batch already processed: {} addresses already in indexed tree",
+                already_indexed
+            );
+            return Ok(());
+        }
+        return Err(IngesterError::ParserError(format!(
+            "Expected {} addresses in address append batch, found 0 in queue",
+            expected_count
+        )));
+    } else if addresses.len() != expected_count {
+        return Err(IngesterError::ParserError(format!(
+            "Expected {} addresses in address append batch, found {}",
+            expected_count,
+            addresses.len()
+        )));
+    }
 
     // Process addresses and perform per-address validations
-    // let mut expected_queue_index = queue_start; // Use the offset queue index
     let mut address_values = Vec::new();
 
     for address in &addresses {
         // Address exists - Each address must have a non-empty value
         if address.address.is_empty() {
             return Err(IngesterError::ParserError(
                 "Address value is missing".to_string(),
             ));
         }
 
         address_values.push(address.address.clone());
     }
 
     // 1. Append the addresses to the indexed merkle tree.
     multi_append(
         txn,
         address_values,
         batch_address_append_event.merkle_tree_pubkey.to_vec(),
         DEFAULT_BATCH_ADDRESS_TREE_HEIGHT + 1,
-        Some(batch_address_append_event.sequence_number as u32),
+        Some(
+            u32::try_from(batch_address_append_event.sequence_number)
+                .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?
+        ),
     )
     .await?;
 
     // 2. Remove inserted elements from the database address queue.
     address_queues::Entity::delete_many()
         .filter(
             address_queues::Column::QueueIndex
                 .gte(queue_start)
                 .and(address_queues::Column::QueueIndex.lt(queue_end))
                 .and(
                     address_queues::Column::Tree
                         .eq(batch_address_append_event.merkle_tree_pubkey.to_vec()),
                 ),
         )
         .exec(txn)
         .await?;
 
     Ok(())
 }

Also applies to: 347-359, 361-398, 423-431, 434-445

288-294: Use overflow-safe conversions for seq and leaf_index in nullify leaves.

Casting u64→u32 and i64→u32 can silently truncate. Convert with try_from and error out on overflow.

Apply:

-        leaf_nodes.push(LeafNode {
+        let seq_u32 = u32::try_from(batch_nullify_event.sequence_number)
+            .map_err(|_| IngesterError::ParserError("sequence_number overflow".to_string()))?;
+        let leaf_index_u32 = u32::try_from(account.leaf_index)
+            .map_err(|_| IngesterError::ParserError("leaf_index overflow".to_string()))?;
+        leaf_nodes.push(LeafNode {
             tree: SerializablePubkey::try_from(account.tree.clone()).map_err(|_| {
                 IngesterError::ParserError(
                     "Failed to convert tree to SerializablePubkey".to_string(),
                 )
             })?,
-            seq: Some(batch_nullify_event.sequence_number as u32),
-            leaf_index: account.leaf_index as u32,
+            seq: Some(seq_u32),
+            leaf_index: leaf_index_u32,
             hash: Hash::new(nullifier.as_slice()).map_err(|_| {
                 IngesterError::ParserError("Failed to convert nullifier to Hash".to_string())
             })?,
         });

🧹 Nitpick comments (2)

src/ingester/persist/persisted_batch_event.rs (2)
75-83: Skip persisting when there are no leaf nodes (avoid no-op DB calls).

Address-append events don’t populate leaf_nodes; append may also short-circuit. Avoid invoking persist_leaf_nodes on empty input.
-            if leaf_nodes.len() <= MAX_SQL_INSERTS {
+            if leaf_nodes.is_empty() {
+                continue;
+            } else if leaf_nodes.len() <= MAX_SQL_INSERTS {
                 persist_leaf_nodes(txn, leaf_nodes, STATE_TREE_HEIGHT_V2 + 1).await?;
             } else {
216-220: Docstring vs behavior mismatch for nullifier queue retention.

The comment says “remove them from the database nullifier queue,” but the implementation intentionally keeps NullifierQueueIndex to support re-indexing. Either update the docstring to match behavior or add the delete if that’s the intended behavior.

Would you like to keep the queue indices (recommended for idempotency) and update the comment, or actually delete them post-marking? I can provide a small patch either way.

Also applies to: 296-298

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0cfb50 and a5038a7.

📒 Files selected for processing (1)

src/ingester/persist/persisted_batch_event.rs (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/ingester/persist/persisted_batch_event.rs (2)

src/common/typedefs/account/context.rs (2)

try_from (109-152)

new (53-103)

src/common/typedefs/account/v2.rs (1)

try_from (40-78)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (2)

src/ingester/persist/persisted_batch_event.rs (2)

75-83: Sanity checks for overflow/underflow and invariants.

Ensure address-append old_next_index ≥ 1 before subtracting 1 for queue_start (address trees are 1-based). The clamped-start patch handles this; just calling out the invariant.

All i64 casts should be validated via try_from to avoid UB on very large indices.

I can add explicit asserts on these invariants if desired.

Also applies to: 199-209, 296-307, 341-346, 434-445

51-87: The script will locate where the DAO-generated code lives so we can inspect the seq column type. Once we have that, we can confirm u32 boundaries.

coderabbitai · 2025-08-26T06:48:37Z

src/ingester/persist/persisted_batch_event.rs

+    // 2. Mark elements as nullified in tree.
+    //      We keep the NullifierQueueIndex to support re-indexing scenarios.
    let query = accounts::Entity::update_many()
-        .col_expr(
-            accounts::Column::NullifierQueueIndex,
-            Expr::value(Option::<i64>::None),
-        )
        .col_expr(accounts::Column::NullifiedInTree, Expr::value(true))
        .filter(
            accounts::Column::NullifierQueueIndex
-                .gte(batch_nullify_event.old_next_index)
-                .and(accounts::Column::NullifierQueueIndex.lt(batch_nullify_event.new_next_index)),
+                .gte(queue_start)
+                .and(accounts::Column::NullifierQueueIndex.lt(queue_end))
+                .and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec())),
        )
        .build(txn.get_database_backend());
+


⚠️ Potential issue

Only mark spent accounts as nullified in tree.

The update filter omits Spent=true, so any rows in the queue range (even if not spent) get marked NullifiedInTree=true. That corrupts state.

Apply:

let query = accounts::Entity::update_many() .col_expr(accounts::Column::NullifiedInTree, Expr::value(true)) .filter( accounts::Column::NullifierQueueIndex .gte(queue_start) .and(accounts::Column::NullifierQueueIndex.lt(queue_end)) - .and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec())), + .and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec())) + .and(accounts::Column::Spent.eq(true)), ) .build(txn.get_database_backend());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// 2. Mark elements as nullified in tree.

// We keep the NullifierQueueIndex to support re-indexing scenarios.

let query = accounts::Entity::update_many()

.col_expr(

accounts::Column::NullifierQueueIndex,

Expr::value(Option::<i64>::None),

)

.col_expr(accounts::Column::NullifiedInTree, Expr::value(true))

.filter(

accounts::Column::NullifierQueueIndex

.gte(batch_nullify_event.old_next_index)

.and(accounts::Column::NullifierQueueIndex.lt(batch_nullify_event.new_next_index)),

.gte(queue_start)

.and(accounts::Column::NullifierQueueIndex.lt(queue_end))

.and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec())),

)

.build(txn.get_database_backend());

// 2. Mark elements as nullified in tree.

// We keep the NullifierQueueIndex to support re-indexing scenarios.

let query = accounts::Entity::update_many()

.col_expr(accounts::Column::NullifiedInTree, Expr::value(true))

.filter(

accounts::Column::NullifierQueueIndex

.gte(queue_start)

.and(accounts::Column::NullifierQueueIndex.lt(queue_end))

.and(accounts::Column::Tree.eq(batch_nullify_event.merkle_tree_pubkey.to_vec()))

.and(accounts::Column::Spent.eq(true)),

)

.build(txn.get_database_backend());

🤖 Prompt for AI Agents

In src/ingester/persist/persisted_batch_event.rs around lines 296 to 307, the update query marks every account in the NullifierQueueIndex range as NullifiedInTree because the filter is missing a Spent=true predicate; add accounts::Column::Spent.eq(true) into the .and(...) chain (combined with the existing gte/lt and Tree checks) so only spent accounts are updated, then rebuild the query against txn.get_database_backend().

sergeytimoshin requested a review from Copilot July 31, 2025 00:12

Copilot AI reviewed Jul 31, 2025

View reviewed changes

sergeytimoshin added the ai-review label Jul 31, 2025

coderabbitai bot reviewed Jul 31, 2025

View reviewed changes

coderabbitai bot reviewed Aug 8, 2025

View reviewed changes

sergeytimoshin and others added 18 commits August 19, 2025 15:29

fix: filter nullifier queue updates by tree

88835d1

stash

d4640f8

sucessfully created snapshot from tx, snapshot untested

86360f4

test snapshot file for address tree v1 gaps

7feb60f

fixed indexed Merkle tree gaps

b194157

stash

4ea82f9

refactor gap detection

5ea15e0

snapshot gap filler works but super slow

c03c857

feat: add non working gap detection to indexer

3b7f8d1

stash

231bebf

gap detection works as test

e44945c

add rewind controller

578c8a6

store claude code

8292b09

refactor: BatchNullifyContext

6778700

replace panics with warnings

74f5f56

remove claude session

2ca7b3c

fix u64::max overflow

f253449

Add batch input accounts to account transactions

06cd7d2

sergeytimoshin and others added 22 commits August 19, 2025 16:16

uUpdate test file paths

f274019

format

d6a6dc2

fix warnings

9af8d2b

ignore gap tests (they are more tools than tests)

f072b71

Update tests/integration_tests/mock_tests.rs

6337b6e

Co-authored-by: Copilot <[email protected]>

Update tests/integration_tests/zeroeth_element_fix_test.rs

29c4a4d

Co-authored-by: Copilot <[email protected]>

Update src/ingester/detect_gaps.rs

d9c2f02

Co-authored-by: Copilot <[email protected]>

Update src/ingester/detect_gaps.rs

c93f2dd

Co-authored-by: Copilot <[email protected]>

Remove AMT/SMT sequence state debug logging

21a1d3a

Move analyze_snapshot tool into src/tools directory

b70eec6

Use environment variable for test snapshot path

a01bdfc

Remove unused StateV2Seq struct

deeea87

cleanup

e42993f

cleanup

e889f35

remove unwraps

ecab5d4

move gap logic into ingester/gap module

40b37d0

cleanup

ab0684c

refactor: clear cache and sequence state on gap detection rewinds

3bb1f29

refactor: add a dedicated error variant for gap detection

refactor: add clear_sequence_state fn for rewind recovery

1c70149

fix gap detection

1cb338e

Add flag to disable sequence gap detection and rewind

d1ecb05

The new `--disable-gap-detection` flag allows users to opt out of the sequence gap detection and rewinding behavior, simplifying the indexing process in cases where it's not needed.

Remove sequence state clearing on block rewind

7823882

sergeytimoshin force-pushed the sergey/account-tx-safe branch from 4db2da2 to 7823882 Compare August 19, 2025 15:16

coderabbitai bot reviewed Aug 19, 2025

View reviewed changes

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

disable tree validation on batch events

a0cfb50

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

disable tree validation

a5038a7

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

		derive_block_state_update(block, None, None)?;
		persist_state_update(&txn, derive_block_state_update(block, None, None)?).await?;

		// Update sequence state with latest observed sequences
		sequences.update_sequence_state();

feat: gap detection & rewind #38

Are you sure you want to change the base?

feat: gap detection & rewind #38

Uh oh!

Conversation

sergeytimoshin commented Jul 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

sergeytimoshin commented Jul 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 31, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)