-
Notifications
You must be signed in to change notification settings - Fork 47
Subnet management script and node init config fixes #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit addresses a critical bug in `ipc-cli node init` that prevented libp2p from binding to network interfaces on cloud VMs (GCP, AWS, Azure). The fix ensures that `listen_addr` is set to `0.0.0.0` for proper binding, while `external_addresses` correctly advertises the public IP. This change restores functionality for parent finality voting and top-down message execution. Changes include: - Updated `ConnectionOverrideConfig` to include `external_addresses`. - Modified port configuration logic to use `0.0.0.0` for `listen_addr`. - Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect these changes. - Added tests to verify the correct configuration behavior. Existing deployments may need to reinitialize or manually update their configurations to apply this fix.
This commit introduces a new `listen-ip` field in the P2P configuration, allowing advanced users to specify a custom IP address for binding services, while maintaining the default of `0.0.0.0` for maximum compatibility. This enhancement addresses previous limitations in binding on cloud VMs and improves flexibility for complex network setups. Changes include: - Updated `P2pConfig` structure to include the `listen-ip` field. - Adjusted port configuration logic to utilize the `listen-ip` for binding. - Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect the new configuration options and usage examples. - Added tests to ensure correct behavior of the new `listen-ip` functionality. This update is fully backward compatible and does not require changes to existing configurations.
…ality issue This commit updates the subnet configuration by changing the validator power from 1 to 3 and modifying the subnet ID to ensure compatibility with the latest deployment requirements. Additionally, a new markdown file is introduced to document the 16-hour lookback issue affecting parent finality on the Glif Calibration testnet, outlining the problem, root cause, and proposed solutions. Changes include: - Updated `ipc-subnet-config.yml` with new subnet ID and validator power. - Added `PARENT-FINALITY-16H-LOOKBACK-ISSUE.md` to provide detailed insights into the parent finality issue and potential workarounds. These updates aim to enhance the reliability and documentation of the IPC subnet management process.
…inality progress This commit introduces a new `watch-finality` command to the IPC subnet manager, enabling users to monitor parent finality progress in real-time. The command supports continuous monitoring, target epoch tracking, and customizable refresh intervals. Changes include: - Added `cmd_watch_finality()` function in `ipc-subnet-manager.sh`. - Updated usage documentation to include examples for the new command. - Implemented `watch_parent_finality()` function in `lib/health.sh` for monitoring logic. - Created `WATCH-FINALITY-FEATURE.md` to document usage, output, and potential use cases. These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of parent finality and subnet health.
…onitoring This commit adds a new `watch-blocks` command to the IPC subnet manager, enabling users to monitor block production in real-time. The command supports continuous monitoring, target height tracking, and customizable refresh intervals. Changes include: - Implemented `cmd_watch_blocks()` function in `ipc-subnet-manager.sh`. - Added `watch_block_production()` function in `lib/health.sh` for monitoring logic. - Updated usage documentation with examples for the new command. - Created `WATCH-BLOCKS-FEATURE.md` to document usage, output, and potential use cases. - Adjusted `ipc-subnet-config.yml` to optimize block production settings. These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of block production and overall subnet health.
This commit introduces an extensive "Advanced Performance Tuning Guide" to optimize IPC subnet performance, detailing configuration changes and expected impacts on consensus timeouts, block production, and network performance. Additionally, a new script, `apply-advanced-tuning.sh`, is added to automate the application of these optimizations to existing nodes without reinitialization. Changes include: - Created `ADVANCED-TUNING-GUIDE.md` with detailed tuning parameters and expected performance improvements. - Added `apply-advanced-tuning.sh` script for seamless configuration updates across validators. - Updated `ipc-subnet-config.yml` with optimized settings for faster block production and parent finality. - Introduced `OPTIMIZATION-SUMMARY.md` and `PERFORMANCE-OPTIMIZATION-RESULTS.md` to document performance improvements and configurations. - Enhanced `TUNING-QUICK-REF.md` for quick access to tuning actions and parameters. These enhancements significantly improve the performance and reliability of the IPC subnet, making it competitive with leading blockchain networks.
This commit introduces a comprehensive solution to address the broadcasting error encountered by validators due to incorrect address configuration. The changes include: - Added `BOTTOMUP-CHECKPOINT-FIX.md` to document the problem, root cause, and the necessary fix for validator configurations. - Created `fix-bottomup-checkpoint.sh` script to automate the process of disabling bottom-up checkpointing for federated subnets and updating validator configurations. - Updated `lib/config.sh` to set the default validator key kind to "ethereum" for EVM-based subnets, preventing future issues. These enhancements ensure that bottom-up checkpointing is operational and that validators are correctly configured for EVM compatibility, improving overall subnet reliability.
This commit adds a comprehensive live monitoring dashboard to the IPC subnet manager, enabling real-time tracking of various metrics and error categorization. Key changes include: - Created `lib/dashboard.sh` for core dashboard functionality, including metrics collection and UI rendering. - Added `cmd_dashboard()` function to `ipc-subnet-manager.sh` for command integration. - Developed multiple documentation files detailing dashboard features, implementation, and quick reference guides. - Enhanced error handling and formatting in the dashboard display for improved user experience. These enhancements significantly improve the monitoring capabilities of the IPC subnet manager, providing users with a unified view of subnet health and activity.
This commit introduces a new `BottomUpSettings` struct to manage bottom-up checkpointing configurations, including an option to enable or disable the feature. Key changes include: - Added `BottomUpSettings` struct with a default enabled state. - Updated `IpcSettings` to include a configuration for bottom-up checkpointing. - Enhanced `BottomUpManager` to accept a flag indicating whether bottom-up checkpointing is enabled. - Implemented logic to conditionally execute bottom-up checkpointing based on the new settings. These enhancements provide greater flexibility in managing checkpointing behavior within the IPC subnet, improving overall system reliability.
…t management This commit introduces a comprehensive "Consensus Recovery Guide" and a "Diagnostic Tools Summary" to assist users in diagnosing and recovering from consensus issues within IPC subnets. Key changes include: - Added `CONSENSUS-RECOVERY-GUIDE.md` detailing steps for diagnosing and resolving consensus problems, including commands for checking consensus and voting status. - Introduced `DIAGNOSTIC-TOOLS-SUMMARY.md` outlining new commands like `consensus-status` and `voting-status`, enhancing the ability to monitor validator health and participation. - Updated `ipc-subnet-manager.sh` to integrate new diagnostic commands. - Enhanced `lib/health.sh` with functions to display consensus and voting statuses, improving operational visibility. These enhancements significantly improve the operational capabilities of the IPC subnet manager, enabling targeted recovery actions without data loss and fostering better understanding of consensus dynamics.
…sting This commit introduces several new scripts to enhance the IPC subnet manager's functionality. Key changes include: - Added `enable-gateway-ports.sh` to enable GatewayPorts on remote VMs for SSH reverse tunneling. - Introduced `setup-anvil-tunnels.sh` to establish SSH tunnels from local Anvil to remote validator nodes, allowing access to Anvil running on localhost. - Created `test-anvil-connection.sh` to verify Anvil connectivity from remote VMs through the established SSH tunnels. - Updated `ipc-subnet-config.yml` with new configuration settings for improved local and remote RPC endpoints. These enhancements significantly improve the operational capabilities of the IPC subnet manager, facilitating better connectivity and management of validator nodes.
This commit introduces a new script, `debug-relayer-error.sh`, designed to assist in diagnosing issues related to checkpoint submission failures in the IPC subnet manager. Key features include: - A series of connectivity checks to ensure the Anvil RPC is accessible. - Validation of the existence of the Gateway and Subnet Actor contracts. - Checks for the last bottom-up checkpoint height and subnet activity status. - Recommendations for common issues encountered during relayer operations. Additionally, new documentation files, including `FIXES-SUMMARY.md`, `IPC-CONFIG-ORDER-FIX.md`, and `RELAYER-UPDATE-SUMMARY.md`, have been added to summarize recent fixes and updates related to relayer connectivity and configuration management. These enhancements significantly improve the operational capabilities of the IPC subnet manager, providing users with tools to effectively troubleshoot and resolve relayer-related issues.
This commit introduces a new documentation file, `INSTALL-SYSTEMD-FIX.md`, detailing fixes for common issues encountered during the installation of systemd services in the IPC subnet manager. Key changes include: - Resolved installation issues where services were only installed on the first validator due to arithmetic expansion errors. - Ensured the relayer service is installed correctly when requested. - Added initialization for the `SCRIPT_DIR` variable in service generation functions to prevent template file access issues. - Included steps to unmask services on affected validators before installation. Additionally, improvements were made to the `ipc-subnet-manager.sh` and `lib/health.sh` scripts to enhance error handling and logging during the installation process. These enhancements significantly improve the reliability and usability of the IPC subnet manager's systemd service installation process.
This commit updates the `ipc-subnet-config.yml` with new subnet IDs and contract addresses for improved configuration accuracy. Additionally, it introduces a `--debug` option in the `ipc-subnet-manager.sh` script to enable verbose logging during initialization and error handling, enhancing the debugging process. A new `RELAYER-AND-RESOLVER-FIX.md` documentation file is added, detailing fixes for relayer configuration issues and invalid resolver paths, ensuring better operational reliability.
… configuration improvements This commit introduces a new command, `update-binaries`, to the `ipc-subnet-manager.sh` script, allowing users to pull the latest code, build, and install binaries on all validators. The command supports specifying a git branch for updates. Additionally, the `ipc-subnet-config.yml` file has been updated with new paths for the IPC repository, and several contract addresses have been modified for improved configuration accuracy. These enhancements streamline the process of maintaining validator binaries and ensure better operational reliability.
This commit adds functionality to convert the validator key to an Ethereum address using fendermint within the `show_subnet_info` function of `lib/health.sh`. It logs the converted address if successful, or warns if the conversion fails. This enhancement improves the visibility of validator information and aids in debugging by providing relevant Ethereum addresses alongside public keys.
This commit introduces a new script, `estimate-gas.sh`, designed to estimate gas usage for transactions between Ethereum addresses. The script utilizes JSON RPC to fetch gas estimates and provides a breakdown of costs at various gas prices. It also includes a recommendation for gas with a 20% buffer, enhancing the operational capabilities of the IPC subnet manager by aiding users in transaction cost planning.
This commit adds a newline at the end of the `estimate-gas.sh` script to ensure consistency with coding standards and improve readability. This minor adjustment helps maintain a clean file structure in the project.
This commit introduces a complete ELK (Elasticsearch, Logstash, Kibana) stack for aggregating logs from IPC validator nodes. Key components include: - Docker Compose configuration for orchestrating the ELK stack. - Elasticsearch for log storage and search capabilities. - Logstash for processing and parsing logs from validators. - Kibana for visualizing logs and creating dashboards. - Grafana for alternative visualization options. Additionally, comprehensive documentation is provided, including setup guides, troubleshooting tips, and monitoring instructions, ensuring a robust logging infrastructure for IPC validators.
| # If we couldn't get it from logs, assume it's stuck at the known value | ||
| if [ -z "$SUBNET_FINALITY" ] || [ "$SUBNET_FINALITY" = "0" ]; then | ||
| SUBNET_FINALITY="3135524" # Known stuck value | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Hardcoded Value Skews Cross-Environment Monitoring
When SUBNET_FINALITY cannot be retrieved from logs, the script falls back to a hardcoded value 3135524 labeled as "known stuck value". This fallback produces incorrect lag calculations for any deployment other than the specific one this was developed on, making the monitoring script unreliable across different environments.
| ssh_user: "philip" | ||
| ipc_user: "ipc" | ||
| role: "secondary" | ||
| private_key: "0xc1099a062e296366a2ac3b26ac80a409833e6a74edbf677a0bd14580d2c68ea2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Private Keys: Repository Exposure Risk
The configuration file contains three plaintext private keys committed to the repository. These appear to be actual validator private keys rather than example placeholders, given the presence of real IP addresses, subnet IDs, and personal usernames throughout the file. Committing private keys exposes validator control and any associated funds to compromise.
This commit introduces a new local deployment mode for the IPC subnet manager, allowing multiple validators to run on a single machine. Key features include: - A new configuration file, `ipc-subnet-config-local.yml`, for local mode settings. - Automatic management of Anvil, including starting and stopping it as needed. - Systematic port allocation for validators to avoid conflicts. - CLI enhancements to support local mode operations, including a `--mode` flag. - Comprehensive documentation detailing the local mode implementation and usage instructions. These changes enhance the flexibility and usability of the IPC subnet manager for local development and testing environments.
This commit introduces a new feature in the IPC subnet manager that automates the deployment of subnets before initializing validator nodes. Key changes include: - A new `deploy_subnet()` function in `lib/health.sh` that handles the creation of subnets and deployment of gateway contracts. - Updates to the `ipc-subnet-manager.sh` script to incorporate subnet deployment as a prerequisite for node initialization. - Modifications to the `ipc-subnet-config-local.yml` to include a `deploy_subnet` flag for enabling automatic deployment. - Enhanced error handling and logging to ensure successful subnet creation and configuration updates. These improvements streamline the setup process for local development environments, reducing the likelihood of initialization errors related to missing subnets.
This commit updates the `ipc-subnet-config-local.yml` to change the subnet ID and adjust the Ethereum API port to avoid conflicts with Anvil. It also modifies the `ipc-subnet-manager.sh` script to streamline the genesis creation process, ensuring it works for both activated and non-activated subnets. Additionally, the `create_bootstrap_genesis` function in `lib/health.sh` is enhanced to utilize the `ipc-cli subnet create-genesis` command, improving error handling and logging for better visibility during subnet initialization. These changes enhance the reliability and usability of the IPC subnet manager for local development environments.
faucet/.env
Outdated
| RPC_URL=http://node-1.test.ipc.space:8545 | ||
| FAUCET_AMOUNT=10 | ||
| RATE_LIMIT_WINDOW=86400000 | ||
| RATE_LIMIT_MAX=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Sensitive Credentials Exposed in Repository
A .env file containing actual private keys has been committed to the repository. The file includes PRIVATE_KEY=0x5eda872ee2da7bc9d7e0af4507f7d5060aed54d43fd1a72e1283622400c7cb85 and a commented alternative key. Environment files with credentials should never be committed to version control - they should be in .gitignore and users should create them from a template (like .env.example). This exposes private keys that could control real accounts with funds.
This commit refactors the `fetch_metrics` function in `dashboard.sh` to improve the process of gathering metrics from validator nodes. Key changes include: - Replaced SSH commands with a new `exec_on_host` function for executing remote commands, enhancing consistency and reducing timeout complexity. - Updated the method for fetching block height, network info, mempool status, and error logs to utilize local node paths for better compatibility with local deployments. - Improved the extraction of parent height from logs to ensure accurate reporting. - Added a note in the dashboard output to indicate when F3 is disabled for local development. These enhancements improve the reliability and clarity of metrics reporting in the IPC subnet manager.
This commit refactors the `get_chain_id` function in `lib/health.sh` to replace SSH commands with the `exec_on_host` function for executing remote commands. This change enhances consistency and simplifies the process of querying the Ethereum chain ID via JSON-RPC, improving the overall reliability of the health check functionality in the IPC subnet manager.
|
The changes on current files looks good! |
…onality This commit modifies the `ipc-subnet-config-local.yml` to update the subnet ID and parent contract addresses for better alignment with local deployment requirements. Additionally, it refactors the `check_validator_health` function in `lib/health.sh` to enhance the process of checking validator health by replacing SSH commands with the `exec_on_host` function, improving consistency and reliability in health checks. These changes streamline the configuration and monitoring of validators in the IPC subnet manager.
This commit updates the Logstash configuration in `ipc-logs.conf` to extract the hostname before cleanup, allowing for the use of a new field `validator_hostname` in the index name. This change improves the organization of logs by ensuring that the index is named consistently based on the validator's hostname, enhancing log management and retrieval.
This commit updates the `draw_dashboard` function in `dashboard.sh` to calculate the expected number of peers based on the count of validators, excluding the self-validator. This change enhances the accuracy of the network health status displayed in the dashboard, improving overall monitoring capabilities.
This commit updates the `fetch_metrics` function in `dashboard.sh` to include the fetching of the mempool maximum size from the CometBFT configuration. The maximum size is now dynamically set if not already defined, improving the accuracy of mempool metrics displayed in the dashboard. Additionally, the default value for `mempool_max` is adjusted to align with this change, enhancing overall monitoring capabilities.
This commit updates the `monitor-parent-finality-simple.sh` script to enhance the method of extracting finality information from logs. The previous use of `grep -P` has been replaced with a combination of `grep` and `sed` for better portability. This change ensures more reliable parsing of log entries, improving the accuracy of finality reporting in the monitoring process.
…er function This commit updates the `set_federated_power` function in `lib/health.sh` to dynamically determine the `--from` address for transactions based on the primary validator's private key. If the address is not specified in the configuration, it derives the address from known Anvil accounts, improving flexibility and reducing configuration errors. Additionally, it logs the address being used for transactions, enhancing visibility during execution.
This commit modifies the `bottomup_enabled` method in `lib.rs` to return true by default when the bottom-up configuration is not specified. This change aligns with the intended default behavior of enabling bottom-up checkpointing, enhancing the clarity and consistency of the settings implementation.
This commit adds a new example environment file `.env.example` for the IPC faucet, providing a template for users to configure their environment variables. It also updates the `.gitignore` to exclude `.env` files containing sensitive credentials and removes the existing `.env` file to enhance security. Additionally, a README.md file is introduced to guide users on setting up and running the faucet application.
| - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin} | ||
| - GF_INSTALL_PLUGINS=grafana-elasticsearch-datasource | ||
| - GF_SERVER_ROOT_URL=http://localhost:3000 | ||
| - GF_USERS_ALLOW_SIGN_UP=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grafana missing ELASTIC_PASSWORD environment variable for datasource
Medium Severity
The Grafana Elasticsearch datasource provisioning file references ${ELASTIC_PASSWORD} for basicAuthPassword, but the Grafana container's environment section in docker-compose.yml does not include ELASTIC_PASSWORD. Grafana only expands environment variables that are available to its process. Since this variable is missing, the datasource authentication will fail, and Grafana won't be able to connect to Elasticsearch.
Additional Locations (1)
…amic prompts This commit updates the `clear-mempool.sh` script to accept command-line parameters for the validator IP and SSH user, defaulting to prompts if not provided. It improves user experience by ensuring required inputs are validated and dynamically retrieves the script directory for better usability when referencing the subnet manager. These changes streamline the process of diagnosing and clearing stuck transactions in the IPC subnet mempool.
| console.log(` To: ${tx.to}`) | ||
| console.log(` Value: ${ethers.formatEther(tx.value || 0)} tFIL`) | ||
| console.log(` Nonce: ${parseInt(tx.nonce)}`) | ||
| console.log(` Status: ${receipt.status === 1 ? '✅ Success' : '❌ Failed'}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential null reference when accessing transaction receipt status
Low Severity
The getTransactionReceipt call can return null in ethers.js v6, but the code accesses receipt.status directly without a null check. If the receipt is unavailable (due to timing issues or RPC inconsistencies), this will throw a TypeError: Cannot read properties of null (reading 'status'). The error is caught by the outer try-catch, but results in a misleading "Could not fetch recent transactions" message instead of properly handling the null receipt case.
| if let Some(resolver_port) = ports.resolver { | ||
| log::info!("Configuring Fendermint resolver port: {}", resolver_port); | ||
|
|
||
| // Use listen_ip (defaults to 0.0.0.0) for listen_addr to allow binding on any interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CometBFT ignores listen_ip configuration option
Medium Severity
The CometBFT P2P laddr is hardcoded to 0.0.0.0 instead of using the new listen_ip configuration option. The documentation states that listen-ip is for "IP address to bind services to", implying it applies to all services. However, the implementation only applies listen_ip to the Fendermint resolver (line 102) while CometBFT remains hardcoded. If a user sets a custom listen_ip to restrict binding to a specific interface, CometBFT will still bind to all interfaces, contradicting the documented behavior and potentially creating a security concern.
This commit updates the `elk-manager.sh` script to introduce a new command for deleting entire Elasticsearch indices older than a specified number of days, alongside improvements to the existing delete-old-logs command. The script now provides clearer warnings about the destructive nature of the new command and enhances user guidance with examples. Additionally, it refines log messages for better clarity during operations, improving overall usability and safety in managing ELK stack logs.
…t manager path This commit modifies the `elk-manager.sh` script to allow the IPC subnet manager configuration path to be set via an environment variable, enhancing flexibility. It updates the filebeat status check to use this variable, providing clearer error messages and guidance for users. Additionally, it improves logging to indicate the configuration file being used, streamlining the management of IPC subnet configurations.
Note
Major infra and networking improvements, plus configurability and docs.
infra/elk-logging/stack (Elasticsearch, Logstash, Kibana, Grafana) with templates, pipelines, dashboards, and management scripts; health checks and provisioning includedfaucet/(env template, README, diagnostics script) for test tokens and tx troubleshootinglisten-ip(default0.0.0.0) and setsexternal_addresses; fixes resolver/libp2p binding on cloud VMs; updates node init to includelisten-ip; adds unit tests for address selectionbottomupsettings and gates creation/signing inBottomUpManagerand service wiringdocs/ipc/node-init.mdP2P guidance (cloud/local), corrects ports; updatesCHANGELOG.mdwith feature and cloud binding fix.gitignore; adds monitoring and subnet manager scripts and guidesWritten by Cursor Bugbot for commit 4b98cf4. This will update automatically on new commits. Configure here.