Add das-grpc-ingest crate#240
Add das-grpc-ingest crate#240Nagaprasadvr wants to merge 83 commits intometaplex-foundation:mainfrom
Conversation
…the main thread keep a chache of seen events so that it doesnt push to ingest if seen by another connections
…dual validators writing to ingest stream
…flow. also pull in backfill to be used in ingest
…ingester using dedicate exec
…est stream which handle reading messages and handling messages.
98ab73e to
1ad25e5
Compare
e4100c9 to
310047e
Compare
| cargo tree | ||
| git checkout Cargo.lock | ||
| cargo tree --frozen | ||
| cargo tree |
There was a problem hiding this comment.
When we update yellowstone-grpc to later package we start getting dependency conflicts. We will be looking at resolving this but would like to do in a follow up.
8381ef6 to
1489bab
Compare
| vergen = "8.2.1" | ||
| wasi = "0.7.0" | ||
| wasm-bindgen = "0.2.83" | ||
| yellowstone-grpc-client = { git = "https://github.com/rpcpool/yellowstone-grpc.git", tag = "v1.15.1+solana.1.18.22" } # tag is geyser plugin |
There was a problem hiding this comment.
@kespinola told me these were being moved to https://crates.io/crates/yellowstone-grpc-kafka so could we update these to use that crate?
There was a problem hiding this comment.
It didn't workout with dependencies, so we are doing a new release which resolves this dep issue and update the pr soon
danenbm
left a comment
There was a problem hiding this comment.
LGTM except for one comment on the crate imports
Summary by CodeRabbit
Summary by CodeRabbit
WalkthroughThis change introduces a new gRPC-based indexer component ( Changes
Sequence Diagram(s)sequenceDiagram
participant YellowstoneGRPC as Yellowstone gRPC Endpoint
participant GrpcIngest as grpc-ingest (grpc2redis)
participant Redis as Redis
participant Ingester as grpc-ingest (ingester)
participant Postgres as Postgres
YellowstoneGRPC->>GrpcIngest: Stream account/tx updates via gRPC
GrpcIngest->>Redis: Push updates to Redis streams
Ingester->>Redis: Read updates from Redis streams
Ingester->>Postgres: Process and store data (accounts, tx, snapshots)
Ingester->>YellowstoneGRPC: (optional) Fetch snapshots or missing blocks
loop Metadata Download
Ingester->>External: Download metadata JSON from URI
Ingester->>Postgres: Update asset metadata in DB
end
Assessment against linked issues
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (4)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 36
🔭 Outside diff range comments (2)
grpc-ingest/config-monitor.example.yml (1)
1-21: 🧹 Nitpick (assertive)Configuration file looks good, consider adding documentation for
only_treesparameter.The configuration file is well-structured with clear comments for each section. However, the
only_trees: nullparameter lacks explanation about its purpose and valid values.# bubblegum merkle tree configuration bubblegum: - only_trees: null + # set to an array of tree IDs to monitor only specific trees, or null to monitor all trees + only_trees: nullgrpc-ingest/config-ingester.example.yml (1)
74-75: 🧹 Nitpick (assertive)Remove excessive blank lines
Remove the excessive blank line at the end of the file to fix the YAML linting error.
# pipeline max idle time in milliseconds pipeline_max_idle_ms: 150 -🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 74-74: too many blank lines
(1 > 0) (empty-lines)
♻️ Duplicate comments (4)
.github/workflows/test.yml (1)
38-42: Removed --frozen flag from cargo tree commandRemoving the
--frozenflag allows the workflow to update and download missing dependencies, which aligns with handling the new dependency requirements.When we update yellowstone-grpc to later package we start getting dependency conflicts. We will be looking at resolving this but would like to do in a follow up.
README.md (1)
18-20: 🧹 Nitpick (assertive)Update section to fix grammar and formatting for consistency.
The GRPC-INGEST section has some formatting and grammatical issues:
- Inconsistent capitalization: "GRPC-INGEST" vs "grpc ingester"
- Missing comma in the second sentence
- "are able to" can be simplified to "can"
-#### GRPC-INGEST [/grpc-ingest/README.md](/grpc-ingest/README.md) - -The grpc ingester allows the DAS index stack to receive solana change events from a Yellowstone GRPC endpoint. All the largest RPC providers offer this API so you are able to use a shared event stream instead of running a dedicated validator like you have to with the plerkle geyser plugin. +#### gRPC-INGEST [/grpc-ingest/README.md](/grpc-ingest/README.md) + +The gRPC ingester allows the DAS index stack to receive Solana change events from a Yellowstone gRPC endpoint. All the largest RPC providers offer this API, so you can use a shared event stream instead of running a dedicated validator like you have to with the plerkle geyser plugin.Note: According to previous review comments, consider moving this content to the gRPC-ingest README instead of keeping it in the main README.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~20-~20: A comma might be missing here.
Context: ...ll the largest RPC providers offer this API so you are able to use a shared event s...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[style] ~20-~20: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...est RPC providers offer this API so you are able to use a shared event stream instead of ru...(BE_ABLE_TO)
docker-compose.yaml (1)
1-133: Missing grpc-ingest in DockerfilesIt appears that grpc-ingest needs to be added to some Dockerfiles as mentioned in a previous review.
#!/bin/bash # Check if grpc-ingest is properly included in the Dockerfiles echo "Checking Dockerfiles for grpc-ingest inclusion:" grep -l "COPY grpc-ingest" *.Dockerfile || echo "grpc-ingest not found in any Dockerfile"grpc-ingest/src/version.rs (1)
14-22: Consider consistent naming for environment variable fields.Based on previous review comments, the field names
rustcandbuildtsmight have been intended to be named differently, but keep them as is if renaming causes errors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (36)
.github/workflows/test.yml(2 hunks)Builder.Dockerfile(1 hunks)Cargo.toml(7 hunks)Proxy.Dockerfile(1 hunks)README.md(1 hunks)core/Cargo.toml(1 hunks)core/src/lib.rs(1 hunks)core/src/metadata_json.rs(1 hunks)docker-compose.yaml(1 hunks)grpc-ingest/.gitignore(1 hunks)grpc-ingest/Cargo.toml(1 hunks)grpc-ingest/README.md(1 hunks)grpc-ingest/build.rs(1 hunks)grpc-ingest/config-grpc2redis.example.yml(1 hunks)grpc-ingest/config-ingester.example.yml(1 hunks)grpc-ingest/config-monitor.example.yml(1 hunks)grpc-ingest/src/config.rs(1 hunks)grpc-ingest/src/grpc.rs(1 hunks)grpc-ingest/src/ingester.rs(1 hunks)grpc-ingest/src/main.rs(1 hunks)grpc-ingest/src/postgres.rs(1 hunks)grpc-ingest/src/prom.rs(1 hunks)grpc-ingest/src/redis.rs(1 hunks)grpc-ingest/src/tracing.rs(1 hunks)grpc-ingest/src/util.rs(1 hunks)grpc-ingest/src/version.rs(1 hunks)nft_ingester/Cargo.toml(1 hunks)nft_ingester/src/tasks/common/mod.rs(1 hunks)ops/src/bubblegum/README.md(1 hunks)program_transformers/Cargo.toml(1 hunks)program_transformers/src/bubblegum/db.rs(3 hunks)program_transformers/src/bubblegum/mint_v1.rs(1 hunks)program_transformers/src/bubblegum/update_metadata.rs(1 hunks)program_transformers/src/lib.rs(2 hunks)program_transformers/src/token_extensions/mod.rs(1 hunks)prometheus-config.yaml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
program_transformers/src/bubblegum/db.rs (1)
core/src/metadata_json.rs (3)
new(43-49)new(53-58)new(276-278)
grpc-ingest/src/grpc.rs (4)
grpc-ingest/src/ingester.rs (5)
config(67-70)futures(263-271)connection(72-75)new(37-45)start(77-178)grpc-ingest/src/redis.rs (18)
config(376-379)redis(325-329)tokio(424-424)tokio(425-425)tokio(428-428)tokio(429-429)connection(381-384)new(231-236)new(248-250)new(278-280)new(313-315)clone(240-242)clone(270-272)clone(301-303)default(649-654)start(399-639)flush(668-678)xadd_maxlen(658-666)grpc-ingest/src/prom.rs (3)
grpc_tasks_total_dec(265-267)grpc_tasks_total_inc(261-263)redis_xadd_status_inc(208-216)grpc-ingest/src/util.rs (1)
create_shutdown(7-19)
🪛 markdownlint-cli2 (0.17.2)
ops/src/bubblegum/README.md
86-86: Files should end with a single newline character
null
(MD047, single-trailing-newline)
grpc-ingest/README.md
1-1: First line in a file should be a top-level heading
null
(MD041, first-line-heading, first-line-h1)
15-15: Bare URL used
null
(MD034, no-bare-urls)
36-36: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
38-38: Bare URL used
null
(MD034, no-bare-urls)
54-54: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
🪛 LanguageTool
README.md
[uncategorized] ~20-~20: A comma might be missing here.
Context: ...ll the largest RPC providers offer this API so you are able to use a shared event s...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[style] ~20-~20: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...est RPC providers offer this API so you are able to use a shared event stream instead of ru...
(BE_ABLE_TO)
grpc-ingest/README.md
[grammar] ~24-~24: The word “setup” is a noun. The verb is spelled with a space.
Context: ...hese files and modify them as needed to setup the project. ### Run grpc2redis serv...
(NOUN_VERB_CONFUSION)
🪛 YAMLlint (1.35.1)
grpc-ingest/config-ingester.example.yml
[error] 74-74: too many blank lines
(1 > 0) (empty-lines)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build-api (ubuntu-22.04)
🔇 Additional comments (40)
prometheus-config.yaml (2)
1-4: Validate aggressive global scrape/evaluation intervals
You’ve setscrape_interval: 1sandevaluation_interval: 5s, which can be very resource-intensive. Confirm that your endpoints and network can handle the 1s scrape cadence without dropping metrics or overwhelming Prometheus.
5-14: Verify scrape targets and hostname resolution
The static targets usehost.docker.internal, which relies on Docker DNS resolution. Ensure this will resolve correctly in all deployment environments (e.g., Linux hosts vs. CI runners), or consider using network aliases or service names defined in Docker Compose.nft_ingester/src/tasks/common/mod.rs (1)
5-5:Details
✅ Verification successful
Verify updated import and workspace dependencies
You’ve switchedDownloadMetadataInfoandDownloadMetadataNotifierfromprogram_transformerstodas_core. Please ensure thatdas-coreis added tonft_ingester’sCargo.tomland that all old references toprogram_transformersare removed to prevent build errors.
🏁 Script executed:
#!/bin/bash # Check for das-core in nft_ingester Cargo.toml rg "das-core" -n nft_ingester/Cargo.toml # Ensure no program_transformers imports remain rg "program_transformers" -n nft_ingester/src/tasks/common/mod.rsLength of output: 141
Dependencies Verified and Cleaned Up
- Confirmed
das-core = { workspace = true }is present at line 17 innft_ingester/Cargo.toml.- No remaining
program_transformersimports innft_ingester/src/tasks/common/mod.rs.All set—no further changes needed.
Builder.Dockerfile (1)
12-12:Details
✅ Verification successful
Ensure new crate is included in the workspace
You’ve addedCOPY grpc-ingest /rust/grpc-ingest, but the build will fail if the rootCargo.tomldoesn’t listgrpc-ingestas a workspace member. Please verify thatgrpc-ingestis present under[workspace].members.
🏁 Script executed:
#!/bin/bash # Confirm grpc-ingest is a workspace member rg -n "members\s*=\s*\[.*grpc-ingest" -n Cargo.tomlLength of output: 53
I couldn’t confirm
grpc-ingestin the workspace with the previous query. Let’s search allCargo.tomlfiles and inspect the workspace section:
🏁 Script executed:
#!/bin/bash # Search for grpc-ingest references in all Cargo.toml rg -n "grpc-ingest" --glob "Cargo.toml" . # Show the workspace section (members list) from the root Cargo.toml for context rg -n "^\[workspace\]" -A10 --glob "Cargo.toml" .Length of output: 519
Verified grpc-ingest in workspace
grpc-ingestis already listed under[workspace].membersinCargo.toml(line 12). No further action needed.core/src/lib.rs (2)
1-3: Module declaration looks goodThe addition of the
metadata_jsonmodule follows the consistent pattern of module declarations in this file.
7-9: Appropriate re-export patternThe export of the
metadata_jsonmodule's public items follows the same pattern used for other modules.Proxy.Dockerfile (1)
9-11: Dockerfile update is correctly implementedThe addition of the new
grpc-ingestdirectory to the Docker build context correctly aligns with the PR's goal of introducing a new gRPC-based indexer component.nft_ingester/Cargo.toml (1)
16-18: Dependency addition looks goodAdding the
das-coreworkspace dependency is appropriate for enabling the use of shared types that were moved fromprogram_transformerstodas-core..github/workflows/test.yml (1)
10-12: String quoting style changeThis is a minor style change from double-quotes to single-quotes that doesn't affect functionality.
program_transformers/src/bubblegum/update_metadata.rs (3)
26-27: Sea-orm import simplifiedRemoval of the
JsonValueimport is consistent with the refactoring of theupsert_asset_datafunction which now handles metadata field setting internally.
117-129: Updated function call signatureThe
upsert_asset_datacall no longer includes the metadata and reindex parameters, which is consistent with the function signature changes. The function now internally sets the metadata field to "processing" and reindex to true.
190-191: Proper use of the new DownloadMetadataInfo constructorThe code correctly uses the new
DownloadMetadataInfo::newconstructor method to create the returned value, consistent with the type being moved to thedas-corecrate.program_transformers/Cargo.toml (2)
13-17: Dependencies look good.The new dependencies and multiline formatting of features align well with the broader workspace dependency expansions in the project.
23-23: Added serde dependency.Adding serde as a workspace dependency is appropriate for supporting serialization needs across components.
grpc-ingest/src/util.rs (1)
1-19: Well-implemented shutdown signal handler.The
create_shutdownfunction provides a clean, idiomatic way to handle UNIX termination signals in an asynchronous context using Tokio. This implementation will allow graceful shutdown across the various services within the gRPC ingestion pipeline.program_transformers/src/token_extensions/mod.rs (1)
276-279: Constructor method usage looks goodChanged from direct struct initialization to using the
DownloadMetadataInfo::new()constructor method, which aligns with the refactoring where this type was moved to thedas_corecrate.grpc-ingest/src/postgres.rs (2)
12-22: Pool creation function implements good practicesThe PostgreSQL connection pool creation is well-implemented with proper error handling and configuration parameter usage.
24-27: Metric reporting is straightforwardGood implementation of Prometheus metrics for monitoring pool health. Tracking both total and idle connections provides valuable operational insights.
core/Cargo.toml (1)
11-39: Dependencies appropriately updated for new functionalityThe added dependencies support the new gRPC ingestion, Redis stream handling, and metadata downloading capabilities introduced in this PR.
docker-compose.yaml (1)
124-133: Prometheus service added for metrics collectionThe Prometheus service is properly configured to collect metrics from the new gRPC ingestion components. The host gateway configuration enables it to communicate with services on the host machine.
grpc-ingest/build.rs (4)
1-7: Well structured build script with proper error handling.The build script correctly uses
vergento emit build and Rust compiler metadata, with proper error handling via anyhow.
8-13: Good approach for cleaner Git version formatting.Overriding the default vergen git version with
git_version!()provides a cleaner and more readable version string.
28-38: Efficient version extraction implementation.The
get_pkg_versionfunction efficiently handles potential duplicate versions by using a HashSet for deduplication before joining them as a comma-separated string.
14-26:Details
✅ Verification successful
Verify the relative path assumption for Cargo.lock.
The script loads the Cargo.lock file from a fixed relative path "../Cargo.lock", which assumes this script is always run from within the grpc-ingest directory.
🏁 Script executed:
#!/bin/bash # Check if Cargo.lock exists at the expected relative path if [ -f "../Cargo.lock" ]; then echo "Cargo.lock exists at expected path" else echo "ERROR: Cargo.lock not found at ../Cargo.lock" # Check where Cargo.lock actually exists find .. -name "Cargo.lock" -maxdepth 2 fiLength of output: 182
To accurately verify this, let’s simulate the build script’s working directory. We need to
cdinto thegrpc-ingestcrate folder before checking for../Cargo.lock.
🏁 Script executed:
#!/bin/bash # Move into the crate directory cd grpc-ingest # Now check if Cargo.lock exists one level up if [ -f "../Cargo.lock" ]; then echo "✅ Cargo.lock found at ../Cargo.lock relative to grpc-ingest" else echo "❌ Cargo.lock not found at ../Cargo.lock relative to grpc-ingest" echo "Searching for any Cargo.lock within two levels up:" find .. -maxdepth 2 -type f -name "Cargo.lock" fiLength of output: 172
No changes needed — Cargo.lock path is valid
We verified by running the build script from the grpc-ingest directory that
../Cargo.lockcorrectly points to the root Cargo.lock. The relative‐path assumption is safe.grpc-ingest/src/version.rs (1)
1-13: Clean Version struct definition with appropriate traits.The Version struct is well-structured with relevant fields and properly derives Debug and Serialize traits.
grpc-ingest/Cargo.toml (3)
1-7: Good use of workspace inheritance for package metadata.The package metadata appropriately uses workspace inheritance for shared configuration values.
8-57: Comprehensive dependency list with appropriate features.The dependencies are well-specified with necessary features, particularly for sqlx, tokio, and redis.
58-62: Appropriate build dependencies for version information.The build dependencies align well with the functionality in build.rs for version extraction and information generation.
grpc-ingest/config-grpc2redis.example.yml (4)
3-4: Consider security implications of Prometheus endpoint.The Prometheus endpoint is configured to listen on all interfaces (0.0.0.0), which could expose metrics publicly. In production, consider restricting this to localhost or using authentication.
7-17: Update placeholder token value before deployment.The example configuration contains a placeholder null value for x_token that needs to be replaced with an actual token for production use.
19-67: Well-structured subscription configurations with appropriate filters.The subscription configurations are well-organized with appropriate stream settings and filters for different data types.
69-71: Update Redis URL for production deployments.The example uses a local Redis URL that should be updated for production deployments to point to the actual Redis instance.
program_transformers/src/lib.rs (2)
23-23: LGTM! Good refactoring of shared componentsThe import of
DownloadMetadataInfoandDownloadMetadataNotifierfrom the newdas_corecrate effectively centralizes these types, promoting code reuse across the codebase.
52-58: LGTM! Adding Deserialize derivation enables Redis ingestionAdding the
Deserializetrait toAccountInfoandTransactionInfois a good addition that enables these structures to be deserialized from Redis streams in the new gRPC ingestion pipeline.Also applies to: 60-67
program_transformers/src/bubblegum/db.rs (1)
497-506:rows_affected()can report1even when data is unchanged → duplicate download jobsPostgres counts an
ON CONFLICT … DO UPDATEas “1 row affected” even if theUPDATEsets every column to its current value.
The current logic therefore enqueuesDownloadMetadataInforepeatedly, causing unnecessary re-processing.Two common mitigations:
- Add a
WHEREclause that checksreindex IS NOT TRUE(or compares hashes) so the update is a true no-op.- Use
RETURNING xmaxand treatxmax = 0as “row was untouched”.ON CONFLICT (...) DO UPDATE SET reindex = TRUE WHERE excluded.base_info_seq > asset_data.base_info_seq RETURNING (xmax = 0) AS updated;Then decide whether to push to the queue based on the
updatedflag.Cargo.toml (2)
12-13: Workspace member addition looks good
grpc-ingestis properly declared as a workspace member, so the crate will participate in unified version resolution and the cargo workspace build graph.
144-146:Details
❓ Verification inconclusive
Long-term stability risk when depending on git tags for
yellowstone-grpc-*Using git dependencies ties reproducible builds to the availability of the remote repository and prevents publishing to
crates.io.
If possible:
- Depend on the crates once they are published, or
- Mirror the exact commit hash instead of a mutable tag to guard against force-pushes.
This will improve supply-chain security and build reproducibility.
Avoid Git tags for
yellowstone-grpc-*in Cargo.tomlFile:
Cargo.toml(Lines 144–146)Currently you’re pulling these crates directly from a Git tag:
yellowstone-grpc-client = { git = "https://github.com/rpcpool/yellowstone-grpc.git", tag = "v1.15.1+solana.1.18.22" } yellowstone-grpc-proto = { git = "https://github.com/rpcpool/yellowstone-grpc.git", tag = "v1.15.1+solana.1.18.22" } yellowstone-grpc-tools = { git = "https://github.com/rpcpool/yellowstone-grpc.git", tag = "v1.15.1+solana.1.18.22", default-features = false }Long‐term builds that point to a mutable Git tag risk breakage if the tag is moved or the repo becomes unavailable, and you cannot publish a crate that has Git dependencies. To improve reproducibility and supply‐chain security:
- If the crates are published on crates.io, depend on their published versions instead of the Git repo.
- Otherwise, pin to a specific commit SHA rather than a named tag:
yellowstone-grpc-client = { git = "https://github.com/rpcpool/yellowstone-grpc.git", rev = "abcdef1234567890abcdef" }Please verify whether
yellowstone-grpc-client,yellowstone-grpc-proto, andyellowstone-grpc-toolsare available on crates.io; if not, switch torev = "<commit-hash>"to guard against force-pushes.program_transformers/src/bubblegum/mint_v1.rs (1)
200-206: Missingreturn Ok(None)path keeps the DB transaction but leaks the task counterEarly-returning when
uri.is_empty()skips the final logging but keeps the previously incremented metrics in upstream code unchanged.
Verify that the caller properly decrements task counters in this branch; otherwise add an explicit decrement (or remove the earlier increment) to keep Prometheus gauges balanced.grpc-ingest/src/redis.rs (1)
438-457: 🛠️ Refactor suggestionConcurrency cap can be silently exceeded
tasks.len() >= config.max_concurrency { tasks.join_next().await; }removes
only one finished task, so ifmax_concurrencyisNandNnew
messages are delivered at once, the loop will scheduleN+N-1tasks before
tasks.len()drops below the limit.Use a
whileloop ortasks.join_next().await;in a loop to keep the cap
strict:if tasks.len() >= config.max_concurrency { - tasks.join_next().await; + while tasks.len() >= config.max_concurrency { + tasks.join_next().await; + } }Likely an incorrect or invalid review comment.
grpc-ingest/src/config.rs (1)
22-28:json5parser selected only for.jsonextensionThe loader always feeds JSON files through
json5::from_str, even when the
extension is plain.json. While JSON5 is backward-compatible most of the
time, subtle incompatibilities (e.g. single quotes, trailing commas) might
slip into a file that is intended to be strict JSON.Consider:
.json5→json5::from_str.json→serde_json::from_strto avoid surprising behaviour for users expecting standard JSON.
| tree.succeeded | Count of completed tree crawl | ||
| tree.crawled | Time to crawl a tree | ||
| job.completed | Time to complete the job | ||
| job.completed | Time to complete the job No newline at end of file |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add a single trailing newline
Static analysis (MD047) indicates the file should end with exactly one newline. Please restore the trailing newline after line 86 to satisfy the markdown lint rule.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
86-86: Files should end with a single newline character
null
(MD047, single-trailing-newline)
| config-grpc2redis.yaml | ||
| config-ingester.yaml | ||
| config-monitor.yaml |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Consider using glob patterns for config files
Instead of listing each config file explicitly, you can simplify maintenance by using a glob, e.g.:
- config-grpc2redis.yaml
- config-ingester.yaml
- config-monitor.yaml
+ config-*.yamlThis will automatically ignore any future config-*.yaml files without manual updates.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| config-grpc2redis.yaml | |
| config-ingester.yaml | |
| config-monitor.yaml | |
| config-*.yaml |
| serde = { workspace = true } | ||
| serde_json = { workspace = true } | ||
| serde_yaml = { workspace = true } | ||
| solana-sdk = { workspace = true } # only prom rn |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Clarify the solana-sdk dependency usage comment.
The comment "# only prom rn" suggests that solana-sdk is only used for Prometheus metrics currently. Consider documenting this more clearly or removing the comment if the dependency is fully utilized.
-solana-sdk = { workspace = true } # only prom rn
+solana-sdk = { workspace = true } # Currently only used for Prometheus metrics📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| solana-sdk = { workspace = true } # only prom rn | |
| solana-sdk = { workspace = true } # Currently only used for Prometheus metrics |
|
|
||
| ### Metrics | ||
|
|
||
| Both grpc2redis and ingester services expose prometheus metrics and can be accessed at `http://localhost:9090/metrics` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Use proper heading format for the metrics endpoint information
Use a proper heading format instead of emphasis for consistency.
-Both grpc2redis and ingester services expose prometheus metrics and can be accessed at `http://localhost:9090/metrics`
+Both grpc2redis and ingester services expose prometheus metrics and can be accessed at `http://localhost:9090/metrics`.Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
54-54: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
| - Fetch account updates from redis and process them using the `program_transformer` crate | ||
| - Fetch transaction updates from redis and processe them | ||
| - Fetch snapshots from redis and process them | ||
| - download token metedata json and store them in postgres db |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Fix typo in "metadata"
There's a spelling error in "metedata".
-- download token metedata json and store them in postgres db
+- download token metadata json and store them in postgres db📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - download token metedata json and store them in postgres db | |
| - download token metadata json and store them in postgres db |
| lazy_static::lazy_static! { | ||
| static ref REGISTRY: Registry = Registry::new(); | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Consider switching from lazy_static to once_cell
The rest of the codebase already relies on Once from std; adopting
once_cell::sync::Lazy would remove an extra dependency and give the same
semantics with a lighter macro-free API.
| pub fn run_server(address: SocketAddr) -> anyhow::Result<()> { | ||
| static REGISTER: Once = Once::new(); | ||
| REGISTER.call_once(|| { | ||
| macro_rules! register { | ||
| ($collector:ident) => { | ||
| REGISTRY | ||
| .register(Box::new($collector.clone())) | ||
| .expect("collector can't be registered"); | ||
| }; | ||
| } | ||
|
|
||
| register!(VERSION_INFO_METRIC); | ||
| register!(REDIS_STREAM_LENGTH); | ||
| register!(REDIS_XADD_STATUS_COUNT); | ||
| register!(REDIS_XREAD_COUNT); | ||
| register!(REDIS_XACK_COUNT); | ||
| register!(PGPOOL_CONNECTIONS); | ||
| register!(PROGRAM_TRANSFORMER_TASK_STATUS_COUNT); | ||
| register!(INGEST_JOB_TIME); | ||
| register!(DOWNLOAD_METADATA_FETCHED_COUNT); | ||
| register!(INGEST_TASKS); | ||
| register!(ACK_TASKS); | ||
| register!(GRPC_TASKS); | ||
| register!(BUBBLEGUM_TREE_TOTAL_LEAVES); | ||
| register!(BUBBLEGUM_TREE_INCORRECT_PROOFS); | ||
| register!(BUBBLEGUM_TREE_NOT_FOUND_PROOFS); | ||
| register!(BUBBLEGUM_TREE_CORRECT_PROOFS); | ||
| register!(BUBBLEGUM_TREE_CORRUPT_PROOFS); | ||
| register!(DOWNLOAD_METADATA_PUBLISH_TIME); | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Register default process & build-info collectors for richer telemetry
You manually register every custom collector but miss the standard
prometheus::process_collector and prometheus::build_info collectors that
expose CPU, memory and Cargo metadata. Adding them costs nothing and improves
out-of-the-box dashboards:
register!(prometheus::process_collector::ProcessCollector::new());
register!(prometheus::build_info::BuildInfoCollector::new());(They can be conditionally compiled to avoid duplication in other binaries.)
| static ref DOWNLOAD_METADATA_FETCHED_COUNT: IntGaugeVec = IntGaugeVec::new( | ||
| Opts::new("download_metadata_fetched_count", "Status of download metadata task"), | ||
| &["status"] | ||
| ).unwrap(); | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use IntCounterVec for monotonically-increasing “*_count” metric
DOWNLOAD_METADATA_FETCHED_COUNT is defined as an IntGaugeVec, yet in
download_metadata_json_task_status_count_inc (lines 269-273) you only ever
increment it.
According to Prometheus best-practices, counters must be used for values that
should only go up. Gauges are intended for values that may also go down or be
set arbitrarily, and mixing semantics hurts alerting & rate calculations.
- static ref DOWNLOAD_METADATA_FETCHED_COUNT: IntGaugeVec = IntGaugeVec::new(
- Opts::new("download_metadata_fetched_count", "Status of download metadata task"),
- &["status"]
- ).unwrap();
+ static ref DOWNLOAD_METADATA_FETCHED_COUNT: IntCounterVec = IntCounterVec::new(
+ Opts::new("download_metadata_fetched_count", "Status of download metadata task"),
+ &["status"]
+ ).unwrap();No other code changes are needed because both inc() and with_label_values
are available on IntCounterVec.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static ref DOWNLOAD_METADATA_FETCHED_COUNT: IntGaugeVec = IntGaugeVec::new( | |
| Opts::new("download_metadata_fetched_count", "Status of download metadata task"), | |
| &["status"] | |
| ).unwrap(); | |
| static ref DOWNLOAD_METADATA_FETCHED_COUNT: IntCounterVec = IntCounterVec::new( | |
| Opts::new("download_metadata_fetched_count", "Status of download metadata task"), | |
| &["status"] | |
| ).unwrap(); |
| let make_service = make_service_fn(move |_: &AddrStream| async move { | ||
| Ok::<_, hyper::Error>(service_fn(move |req: Request<Body>| async move { | ||
| let response = match req.uri().path() { | ||
| "/metrics" => metrics_handler(), | ||
| _ => not_found_handler(), | ||
| }; | ||
| Ok::<_, hyper::Error>(response) | ||
| })) | ||
| }); | ||
|
|
||
| let server = Server::try_bind(&address)?.serve(make_service); | ||
| info!("prometheus server started: http://{address:?}/metrics"); | ||
|
|
||
| tokio::spawn(async move { | ||
| if let Err(error) = server.await { | ||
| error!("prometheus server failed: {error:?}"); | ||
| } | ||
| }); | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Return a JoinHandle (or await) instead of fire-and-forget spawning
run_server starts the Hyper server with tokio::spawn and immediately
returns Ok(()). Callers therefore get no handle to await, cancel or listen
for shutdown, and unit/integration tests cannot deterministically wait for the
server to finish initialising.
Consider returning the JoinHandle<()> (or impl Future) so that the caller
can decide whether to detach or await:
-pub fn run_server(address: SocketAddr) -> anyhow::Result<()> {
+pub fn run_server(address: SocketAddr) -> anyhow::Result<tokio::task::JoinHandle<()>> {
…
- tokio::spawn(async move {
+ let handle = tokio::spawn(async move {
if let Err(error) = server.await {
error!("prometheus server failed: {error:?}");
}
});
- Ok(())
+ Ok(handle)
}That keeps the convenience of background execution while giving downstream code
proper lifecycle control.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let make_service = make_service_fn(move |_: &AddrStream| async move { | |
| Ok::<_, hyper::Error>(service_fn(move |req: Request<Body>| async move { | |
| let response = match req.uri().path() { | |
| "/metrics" => metrics_handler(), | |
| _ => not_found_handler(), | |
| }; | |
| Ok::<_, hyper::Error>(response) | |
| })) | |
| }); | |
| let server = Server::try_bind(&address)?.serve(make_service); | |
| info!("prometheus server started: http://{address:?}/metrics"); | |
| tokio::spawn(async move { | |
| if let Err(error) = server.await { | |
| error!("prometheus server failed: {error:?}"); | |
| } | |
| }); | |
| pub fn run_server(address: SocketAddr) -> anyhow::Result<tokio::task::JoinHandle<()>> { | |
| let make_service = make_service_fn(move |_: &AddrStream| async move { | |
| Ok::<_, hyper::Error>(service_fn(move |req: Request<Body>| async move { | |
| let response = match req.uri().path() { | |
| "/metrics" => metrics_handler(), | |
| _ => not_found_handler(), | |
| }; | |
| Ok::<_, hyper::Error>(response) | |
| })) | |
| }); | |
| let server = Server::try_bind(&address)?.serve(make_service); | |
| info!("prometheus server started: http://{address:?}/metrics"); | |
| let handle = tokio::spawn(async move { | |
| if let Err(error) = server.await { | |
| error!("prometheus server failed: {error:?}"); | |
| } | |
| }); | |
| Ok(handle) | |
| } |
| fn metrics_handler() -> Response<Body> { | ||
| let metrics = TextEncoder::new() | ||
| .encode_to_string(®ISTRY.gather()) | ||
| .unwrap_or_else(|error| { | ||
| error!("could not encode custom metrics: {}", error); | ||
| String::new() | ||
| }); | ||
| Response::builder() | ||
| .header("content-type", "text/plain") | ||
| .body(Body::from(metrics)) | ||
| .unwrap() | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Reuse a single TextEncoder to avoid tiny per-request allocations
metrics_handler creates a new TextEncoder on every scrape even though the
type is stateless. You can store one in a lazy_static or once_cell and
reuse it:
-fn metrics_handler() -> Response<Body> {
- let metrics = TextEncoder::new()
+lazy_static::lazy_static! {
+ static ref TEXT_ENCODER: prometheus::TextEncoder = prometheus::TextEncoder::new();
+}
+
+fn metrics_handler() -> Response<Body> {
+ let metrics = TEXT_ENCODER
.encode_to_string(®ISTRY.gather())The saving is small but free and eliminates needless allocations on every
Prometheus scrape.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn metrics_handler() -> Response<Body> { | |
| let metrics = TextEncoder::new() | |
| .encode_to_string(®ISTRY.gather()) | |
| .unwrap_or_else(|error| { | |
| error!("could not encode custom metrics: {}", error); | |
| String::new() | |
| }); | |
| Response::builder() | |
| .header("content-type", "text/plain") | |
| .body(Body::from(metrics)) | |
| .unwrap() | |
| } | |
| // Add at the top of the file (or before `metrics_handler`) | |
| lazy_static::lazy_static! { | |
| static ref TEXT_ENCODER: prometheus::TextEncoder = prometheus::TextEncoder::new(); | |
| } | |
| fn metrics_handler() -> Response<Body> { | |
| let metrics = TEXT_ENCODER | |
| .encode_to_string(®ISTRY.gather()) | |
| .unwrap_or_else(|error| { | |
| error!("could not encode custom metrics: {}", error); | |
| String::new() | |
| }); | |
| Response::builder() | |
| .header("content-type", "text/plain") | |
| .body(Body::from(metrics)) | |
| .unwrap() | |
| } |
…nfrastructure into grpc-cleanup
Goal
Reduce the operational cost and complexity of running DAS by switching to Dragonmouth grpc stream for account and transaction updates.
Upgrade the ingestion engine by refactoring configuration, workers, and metrics.
Approach
Create a new crate named das-grpc-ingest which has 3 primary subcommands.
grpc2redis - connects to Dragonmouth and pushes relevant events into a redis queue
ingest - subscribes to redis streams, spawns workers, and processes events using program_transformers crate.
closes #83
Next Steps
nft_ingestertodas-plerkle-ingestDiagram