-
Notifications
You must be signed in to change notification settings - Fork 2
New metrics automation #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for docs-extensions-and-macros ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis change introduces a new, source-based workflow for extracting and generating Redpanda metrics documentation. It adds a Python-based metrics extractor tool using Tree-sitter to parse C++ source files, a Makefile to automate environment setup and extraction, and CLI integration via new commands in Sequence Diagram(s)sequenceDiagram
participant User
participant DocToolsCLI
participant Makefile
participant MetricsExtractor (Python)
participant TreeSitter
participant RedpandaRepo
User->>DocToolsCLI: Run "generate metrics-docs" or "source-metrics-docs"
DocToolsCLI->>Makefile: Invoke with tag/branch or local path
Makefile->>RedpandaRepo: Clone or use local source
Makefile->>TreeSitter: Build C++ grammar
Makefile->>MetricsExtractor: Run metrics_extractor.py
MetricsExtractor->>TreeSitter: Parse C++ files for metrics
MetricsExtractor->>RedpandaRepo: Read C++ source files
MetricsExtractor-->>Makefile: Output JSON and AsciiDoc files
Makefile-->>DocToolsCLI: Return generated documentation paths
DocToolsCLI-->>User: Print completion and output file locations
Estimated code review effort🎯 5 (Critical) | ⏱️ ~150 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitat review |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@JakeSCahill I'll conduct a comprehensive review of this substantial PR that introduces a new source-based metrics extraction workflow. Let me analyze the key components systematically. Based on my comprehensive analysis, here's my detailed review: 🔍 Overall AssessmentThis is an ambitious and well-structured PR that introduces a modern, source-based approach to metrics documentation generation. The implementation demonstrates solid software engineering practices with proper separation of concerns, comprehensive tooling, and good documentation. ✅ Strengths1. Architecture & Design
2. Code Quality
3. User Experience
|
@@ -429,7 +460,51 @@ function diffDirs(kind, oldTag, newTag) { | |||
|
|||
automation | |||
.command('metrics-docs') | |||
.description('Generate JSON and AsciiDoc documentation for Redpanda metrics') | |||
.description('Generate JSON and AsciiDoc documentation for Redpanda metrics from source code') | |||
.requiredOption('-r, --redpanda-repo <path>', 'Path to the Redpanda repository root directory') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required? For the property-docs
command, we require a tag and use that to clone and check out the repo.
Please give this command the same signature as property-docs
. It should support the diff
option.
automation
.command('property-docs')
.description('Generate JSON and AsciiDoc documentation for Redpanda configuration properties')
.option('--tag <tag>', 'Git tag or branch to extract from', 'dev')
.option('--diff <oldTag>', 'Also diff autogenerated properties from <oldTag> → <tag>')
.action((options) => {
verifyPropertyDependencies();
const newTag = options.tag;
const oldTag = options.diff;
const cwd = path.resolve(__dirname, '../tools/property-extractor');
const make = (tag) => {
console.log(`⏳ Building property docs for ${tag}…`);
const r = spawnSync('make', ['build', `TAG=${tag}`], { cwd, stdio: 'inherit' });
if (r.error) {
console.error(`❌ ${r.error.message}`);
process.exit(1);
}
if (r.status !== 0) process.exit(r.status);
};
if (oldTag) {
const oldDir = path.join('autogenerated', oldTag, 'properties');
if (!fs.existsSync(oldDir)) make(oldTag);
}
make(newTag);
if (oldTag) {
diffDirs('properties', oldTag, newTag);
}
process.exit(0);
});
|
||
const startTime = Date.now(); | ||
const result = spawnSync('python3', [ | ||
path.join(__dirname, '../tools/metrics-extractor/metrics_extractor.py'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have a Makefile, we should execute make build
instead like we do for property-docs
.
.option('--internal-asciidoc <path>', 'Custom path for internal metrics AsciiDoc file', 'autogenerated/internal_metrics_reference.adoc') | ||
.option('--external-asciidoc <path>', 'Custom path for external/public metrics AsciiDoc file', 'autogenerated/public_metrics_reference.adoc') | ||
.action((options) => { | ||
console.log(`🎯 Starting enhanced metrics extraction from source code`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log(`🎯 Starting enhanced metrics extraction from source code`); |
.description('Generate JSON and AsciiDoc documentation for Redpanda metrics from source code') | ||
.requiredOption('-r, --redpanda-repo <path>', 'Path to the Redpanda repository root directory') | ||
.option('--json-output <path>', 'Custom path for JSON output file', 'autogenerated/metrics.json') | ||
.option('--internal-asciidoc <path>', 'Custom path for internal metrics AsciiDoc file', 'autogenerated/internal_metrics_reference.adoc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these new options for specify paths.
We don't typically use underscores in Asciidoc filenames. Please replace with hyphens.
@@ -713,6 +792,41 @@ automation | |||
process.exit(0); | |||
}); | |||
|
|||
automation | |||
.command('source-metrics-docs') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is a separate command. This should be called metrics-docs
and should replace the other one in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also keep support for specifying paths for the output
|
||
All commands generate three files in the `autogenerated/` directory: | ||
|
||
* `internal_metrics_reference.adoc` - Internal metrics for engineering documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update these docs when you fix the filenames from using underscores to hyphens.
filtered = {} | ||
seen_names = set() # Track metric names to detect duplicates | ||
|
||
for name, data in metrics.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
||
# Create more meaningful category names | ||
category_mapping = { | ||
'cluster': 'Cluster metrics', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nice
}); | ||
|
||
automation | ||
.command('metrics-docs-legacy') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just remove the legacy implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using semver versioning, so we can always install a previous version to try it again.
@@ -21,7 +21,9 @@ | |||
"get-console-version": "doc-tools get-console-version", | |||
"build": "antora --to-dir docs --fetch local-antora-playbook.yml", | |||
"serve": "wds --node-resolve --open preview/test/ --watch --root-dir docs", | |||
"test": "jest" | |||
"test": "jest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please bump the version in this file. Since this is quite a large change, bump the minor version to 4.8.0 and then npm install
to update the lock file.
|
||
SHELL := /bin/bash | ||
TAG ?= dev | ||
OUTPUT_DIR := autogenerated/$(TAG)/source-metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the CLI is going to be installed inside our content repos, we'll want the output saved to the root of the repo rather than relative to this package in node_modules
so it's easy to find.
This is what we did for the property-docs
automation:
REPO_ROOT := $(shell git rev-parse --show-toplevel)
MODULE_ROOT := $(shell cd "$(dir $(realpath $(lastword $(MAKEFILE_LIST))))"/../.. && pwd)
TOOL_ROOT := $(MODULE_ROOT)/tools/property-extractor
TMP_ROOT := $(TOOL_ROOT)/tmp
REDPANDA_SRC := $(TMP_ROOT)/redpanda
TREESITTER_DIR:= $(TOOL_ROOT)/tree-sitter/tree-sitter-cpp
VENV := $(TOOL_ROOT)/tmp/redpanda-property-extractor-venv
PYTHON := $(VENV)/bin/python
OUTPUT_DIR := $(REPO_ROOT)/autogenerated/$(TAG)/properties
Looking really nice @paulohtb6 thanks for working on this! When it's ready, can you update our wiki: https://redpandadata.atlassian.net/wiki/spaces/DOC/pages/1185054748/Doc+Tools+CLI |
Some nice suggestions and praise from 🐰 ! take a look at some of the suggestions: #118 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
♻️ Duplicate comments (5)
package.json (1)
24-27
: Version bump still missing & lock-file not updated
Jake already requested raising the package version (→ 4.8.0) and regenerating the lock file. That hasn’t happened yet. Please bump the version and runnpm install
to commit the updatedpackage-lock.json
.tools/metrics-extractor/README.adoc (1)
35-36
: Address the filename convention feedbackThe filenames still use underscores instead of hyphens as previously mentioned by JakeSCahill. Please update to use hyphens for consistency with project conventions.
bin/doc-tools.js (2)
466-467
: Use hyphens instead of underscores in filenamesAs previously mentioned by JakeSCahill, filenames should use hyphens instead of underscores for consistency.
483-488
: Usemake build
instead of direct Python executionAs previously mentioned by JakeSCahill, this should execute
make build
instead of calling the Python script directly, similar to howproperty-docs
works.tools/metrics-extractor/Makefile (1)
6-6
: Output should be saved to repo root, not relative to this packageAs previously mentioned by JakeSCahill, the output should be saved to the repository root rather than relative to this package location. This makes it easier to find when installed in content repos.
Consider updating to use the same pattern as property-extractor:
REPO_ROOT := $(shell git rev-parse --show-toplevel) OUTPUT_DIR := $(REPO_ROOT)/autogenerated/$(TAG)/source-metrics
🧹 Nitpick comments (13)
cli-utils/python-venv.sh (1)
10-10
: Trim the trailing whitespace to avoid noisy diffs
The extra space after the closing quote ("..."␠
) is harmless at runtime but unnecessarily clutters futuregit
blame/diffs.-echo "Recreating Python venv at $VENV_DIR..." +echo "Recreating Python venv at $VENV_DIR..."tools/metrics-extractor/requirements.txt (1)
1-2
: Pinning strategy & supply-chain hygiene
tree_sitter==0.21.1
– hard-pinning implies you’ll miss upstream security/bug fixes. Prefer~=0.21
unless reproducibility absolutely mandates a full pin.- Consider adding
--require-hashes
support (hash-checked requirements) to mitigate dependency-confusion or typosquatting attacks.cli-utils/install-test-dependencies.sh (1)
16-16
: Minor whitespace nit
Same trailing-space issue as the venv script; clean-up keeps history tidy.- echo "Node.js version: $(node -v)" + echo "Node.js version: $(node -v)"package.json (1)
24-27
: Script names OK but undocumented
The newmetrics:extract
andmetrics:legacy
targets look good; ensure README/CONTRIB docs mention them so users don’t rely on the rawnpx
incantation only.tools/metrics-extractor/tests/test_extraction.py (1)
76-79
: Consider making tree-sitter paths configurable.The hardcoded paths for tree-sitter components ("tree-sitter", "tree-sitter-cpp.so") could be made configurable to improve test portability across different environments.
Consider using environment variables or configuration parameters:
- parser, language = get_treesitter_cpp_parser_and_language("tree-sitter", "tree-sitter-cpp.so") + parser, language = get_treesitter_cpp_parser_and_language( + os.environ.get("TREESITTER_PATH", "tree-sitter"), + os.environ.get("TREESITTER_CPP_LIB", "tree-sitter-cpp.so") + )tools/metrics-extractor/validate.py (1)
12-33
: Useimportlib.util.find_spec
for cleaner dependency checkingConsider using
importlib.util.find_spec
instead of try/except imports to check module availability without actually importing them.+import importlib.util + def check_dependencies(): """Check if all required dependencies are available""" print("🔧 Checking dependencies...") - try: - import tree_sitter + if importlib.util.find_spec('tree_sitter') is not None: print(" ✓ tree-sitter is available") - except ImportError: + else: print(" ❌ tree-sitter not found. Install with: pip install tree-sitter") return False # Check if we can import our modules - try: - from metrics_parser import build_treesitter_cpp_library, extract_metrics_from_files - from metrics_bag import MetricsBag + modules_to_check = ['metrics_parser', 'metrics_bag'] + for module in modules_to_check: + if importlib.util.find_spec(module) is None: + print(f" ❌ Module '{module}' not found") + return False + + try: + # Verify specific functions exist by importing them + from metrics_parser import build_treesitter_cpp_library, extract_metrics_from_files + from metrics_bag import MetricsBag print(" ✓ All custom modules are available") except ImportError as e: print(f" ❌ Import error: {e}") return False return Truetools/metrics-extractor/metrics_extractor.py (3)
20-27
: Enhance path validation for better user experience.The current validation only checks if the path exists. Consider adding more comprehensive checks to ensure the path is a valid Redpanda repository.
def validate_paths(options): path = options.redpanda_repo if not os.path.exists(path): logger.error(f'Path does not exist: "{path}".') + logger.error(f'Please ensure you have cloned the Redpanda repository or specify a valid path.') sys.exit(1) + + if not os.path.isdir(path): + logger.error(f'Path is not a directory: "{path}".') + sys.exit(1) + + # Check for expected Redpanda source structure + expected_markers = ['src', 'CMakeLists.txt'] + if not any(os.path.exists(os.path.join(path, marker)) for marker in expected_markers): + logger.warning(f'Path may not be a valid Redpanda repository: "{path}".') + logger.warning('Expected to find "src" directory or "CMakeLists.txt" file.')
158-372
: Consider refactoring this large function for better maintainability.The function is over 200 lines and handles multiple responsibilities. The extensive hardcoded category mappings could be externalized.
Extract the category mapping logic into a separate configuration:
# Define at module level or in a separate config file METRIC_CATEGORY_MAPPINGS = { 'cluster': 'Cluster metrics', 'kafka': 'Kafka metrics', 'raft': 'Raft metrics', # ... rest of the mappings } # Keywords that map to application metrics APPLICATION_METRIC_KEYWORDS = { 'active', 'adjacent', 'anomalies', 'available', 'backlog', # ... rest of the keywords } def get_metric_category(metric_name): """Determine the category for a metric based on its name.""" # Remove prefixes clean_name = metric_name for prefix in ['redpanda_', 'vectorized_']: if clean_name.startswith(prefix): clean_name = clean_name[len(prefix):] break # Get first part parts = clean_name.split('_') category = parts[0] if parts else 'other' # Map to category name if category in METRIC_CATEGORY_MAPPINGS: return METRIC_CATEGORY_MAPPINGS[category] elif category in APPLICATION_METRIC_KEYWORDS: return 'Application metrics' else: return 'Other metrics'Also consider splitting the function into smaller functions:
separate_metrics_by_type()
group_metrics_by_category()
write_metrics_documentation()
455-455
: Remove unnecessary f-string prefix.The string doesn't contain any placeholders.
- print(f"📊 Metrics by type:") + print("📊 Metrics by type:")tools/metrics/compare_metrics.py (1)
308-317
: Good error handling, but consider file size limits.While the error handling is good, consider adding file size validation to prevent issues with very large files.
try: + # Check file sizes first + for filepath in [args.file1, args.file2]: + file_size = os.path.getsize(filepath) + if file_size > 10 * 1024 * 1024: # 10MB limit + print(f"Warning: {filepath} is large ({file_size / 1024 / 1024:.1f}MB). This may take a while.") + with open(args.file1, 'r', encoding='utf-8') as f: content1 = f.read()tools/metrics-extractor/metrics_parser.py (3)
101-129
: Consider compiling regex patterns for better performance.The regex patterns are compiled on every function call. For better performance, especially when processing many files, compile them once at module level.
# At module level LABEL_PATTERNS = [ re.compile(r'\.aggregate\s*\(\s*([^)]+)\s*\)'), # .aggregate(aggregate_labels) re.compile(r'auto\s+(\w*labels\w*)\s*='), # auto aggregate_labels = re.compile(r'std::vector<[^>]*>\s*{([^}]+)}'), # std::vector<sm::label>{sm::shard_label} re.compile(r'sm::([a-z_]*label[a-z_]*)'), # sm::shard_label, sm::topic_label, etc. re.compile(r'"([^"]+)"\s*:\s*[^,}]+'), # key-value pairs ] def extract_labels_from_code(code_context): """Extract potential label names from code context around metrics""" labels = set() for pattern in LABEL_PATTERNS: matches = pattern.findall(code_context) # ... rest of the function
231-255
: Simplify nested conditions where possible.Some nested if statements can be combined for better readability.
- if inner_function and '::metrics_name' in inner_function.text.decode('utf-8'): - # Found it. Extract the string literal from its arguments. - if inner_args and inner_args.named_child_count > 0: + if (inner_function and '::metrics_name' in inner_function.text.decode('utf-8') and + inner_args and inner_args.named_child_count > 0): group_name_node = inner_args.named_children[0] if group_name_node.type == 'string_literal':
1115-1128
: Consider parallel processing for better performance.The current implementation processes files sequentially. For large codebases, parallel processing could significantly improve performance.
from concurrent.futures import ProcessPoolExecutor, as_completed import multiprocessing def extract_metrics_from_files(cpp_files, treesitter_parser, cpp_language, filter_namespace=None): """Extract metrics from multiple C++ files""" all_metrics = MetricsBag() # Use process pool for parallel parsing max_workers = min(multiprocessing.cpu_count(), len(cpp_files)) with ProcessPoolExecutor(max_workers=max_workers) as executor: # Submit all files for processing future_to_file = { executor.submit(parse_cpp_file, file_path, treesitter_parser, cpp_language, filter_namespace): file_path for file_path in cpp_files } # Collect results as they complete for future in as_completed(future_to_file): file_path = future_to_file[future] try: file_metrics = future.result() all_metrics.merge(file_metrics) except Exception as e: logger.warning(f"Failed to process {file_path}: {e}") return all_metricsNote: This would require making the parser objects pickleable or recreating them in each process.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tools/metrics-extractor/tree-sitter-cpp.so
is excluded by!**/*.so
📒 Files selected for processing (18)
README.adoc
(1 hunks)bin/doc-tools.js
(6 hunks)cli-utils/generate-cluster-docs.sh
(3 hunks)cli-utils/install-test-dependencies.sh
(1 hunks)cli-utils/python-venv.sh
(1 hunks)docker-compose/25.1/docker-compose.yml
(4 hunks)docker-compose/docker-compose.yml
(1 hunks)package.json
(1 hunks)tools/metrics-extractor/Makefile
(1 hunks)tools/metrics-extractor/README.adoc
(1 hunks)tools/metrics-extractor/metrics_bag.py
(1 hunks)tools/metrics-extractor/metrics_extractor.py
(1 hunks)tools/metrics-extractor/metrics_parser.py
(1 hunks)tools/metrics-extractor/requirements.txt
(1 hunks)tools/metrics-extractor/tests/test_extraction.py
(1 hunks)tools/metrics-extractor/validate.py
(1 hunks)tools/metrics/compare_metrics.py
(1 hunks)tools/metrics/metrics.py
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tools/metrics-extractor/validate.py (4)
bin/doc-tools.js (4)
os
(4-4)result
(305-305)result
(482-488)result
(619-627)tools/metrics-extractor/metrics_parser.py (2)
build_treesitter_cpp_library
(61-64)extract_metrics_from_files
(1115-1127)tools/metrics-extractor/metrics_bag.py (1)
MetricsBag
(9-211)tools/metrics-extractor/metrics_extractor.py (1)
main
(374-459)
🪛 checkmake (0.2.2)
tools/metrics-extractor/Makefile
[warning] 19-19: Target body for "help" exceeds allowed length of 5 (13).
(maxbodylength)
[warning] 69-69: Target body for "treesitter" exceeds allowed length of 5 (8).
(maxbodylength)
[warning] 97-97: Target body for "clean" exceeds allowed length of 5 (7).
(maxbodylength)
[warning] 15-15: Missing required phony target "test"
(minphony)
🪛 Ruff (0.12.2)
tools/metrics-extractor/validate.py
7-7: json
imported but unused
Remove unused import: json
(F401)
17-17: tree_sitter
imported but unused; consider using importlib.util.find_spec
to test for availability
(F401)
25-25: metrics_parser.build_treesitter_cpp_library
imported but unused; consider using importlib.util.find_spec
to test for availability
(F401)
25-25: metrics_parser.extract_metrics_from_files
imported but unused; consider using importlib.util.find_spec
to test for availability
(F401)
26-26: metrics_bag.MetricsBag
imported but unused; consider using importlib.util.find_spec
to test for availability
(F401)
49-49: f-string without any placeholders
Remove extraneous f
prefix
(F541)
tools/metrics-extractor/metrics_extractor.py
6-6: re
imported but unused
Remove unused import: re
(F401)
12-12: metrics_bag.MetricsBag
imported but unused
Remove unused import: metrics_bag.MetricsBag
(F401)
236-239: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
238-241: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
240-243: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
455-455: f-string without any placeholders
Remove extraneous f
prefix
(F541)
tools/metrics/compare_metrics.py
12-12: typing.Tuple
imported but unused
Remove unused import
(F401)
12-12: typing.Optional
imported but unused
Remove unused import
(F401)
231-231: f-string without any placeholders
Remove extraneous f
prefix
(F541)
234-234: f-string without any placeholders
Remove extraneous f
prefix
(F541)
243-243: f-string without any placeholders
Remove extraneous f
prefix
(F541)
251-251: f-string without any placeholders
Remove extraneous f
prefix
(F541)
272-272: f-string without any placeholders
Remove extraneous f
prefix
(F541)
280-280: f-string without any placeholders
Remove extraneous f
prefix
(F541)
287-287: f-string without any placeholders
Remove extraneous f
prefix
(F541)
tools/metrics-extractor/metrics_parser.py
1-1: os
imported but unused
Remove unused import: os
(F401)
3-3: subprocess
imported but unused
Remove unused import: subprocess
(F401)
231-233: Use a single if
statement instead of nested if
statements
(SIM102)
309-309: f-string without any placeholders
Remove extraneous f
prefix
(F541)
426-430: Use a single if
statement instead of nested if
statements
(SIM102)
428-430: Use a single if
statement instead of nested if
statements
(SIM102)
489-490: Use a single if
statement instead of nested if
statements
(SIM102)
553-554: Use a single if
statement instead of nested if
statements
(SIM102)
619-621: Use a single if
statement instead of nested if
statements
(SIM102)
637-638: Use a single if
statement instead of nested if
statements
(SIM102)
681-682: Use a single if
statement instead of nested if
statements
(SIM102)
698-699: Use a single if
statement instead of nested if
statements
(SIM102)
tools/metrics-extractor/metrics_bag.py
3-3: uuid
imported but unused
Remove unused import: uuid
(F401)
125-125: Loop control variable name
not used within loop body
Rename unused name
to _name
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Redirect rules - docs-extensions-and-macros
- GitHub Check: Header rules - docs-extensions-and-macros
- GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (33)
tools/metrics/metrics.py (4)
123-142
: Well-implemented filtering function for documentation cleanup.The
filter_metrics_for_docs
function effectively addresses the need to exclude histogram suffixes and handle duplicates for documentation while preserving complete data for JSON output. The logic is sound and the logging provides good visibility into filtering operations.
183-184
: Improved error handling for directory creation.Good change from logging an error and exiting to actually creating the directory when it doesn't exist. This makes the script more robust and user-friendly.
207-209
: Proper separation of filtered vs unfiltered metrics.The approach of filtering metrics specifically for documentation (AsciiDoc) while keeping the original unfiltered data for JSON output is well-designed. This ensures documentation clarity without losing data completeness.
223-224
: Consistent application of filtering to both metric types.Both public and internal metrics are properly filtered using the new function before AsciiDoc generation, ensuring consistent documentation output across both metric types.
README.adoc (1)
969-1008
: Excellent documentation for the new CLI tools.The new CLI Tools section provides clear and comprehensive documentation for the metrics automation features introduced in this PR. The documentation:
- Clearly explains the three different workflows (source-based, local repo, legacy Docker)
- Documents the output files users can expect
- Provides practical usage examples with appropriate command syntax
- Includes helpful context about when to use each approach
This aligns perfectly with the PR objectives for introducing reliable automated metrics documentation generation.
tools/metrics-extractor/tests/test_extraction.py (2)
1-61
: Well-structured test with comprehensive sample C++ code.The test provides good coverage of the metrics extraction functionality with a realistic sample that includes various metric constructor types. The C++ code structure accurately represents what would be found in the Redpanda codebase.
88-104
: Robust test validation with good error reporting.The test validation logic effectively checks both the presence and type correctness of extracted metrics. The visual feedback with checkmarks and error messages makes test results easy to interpret.
docker-compose/25.1/docker-compose.yml (3)
54-54
: Improved healthcheck simplification.Changing the healthcheck from
rpk cluster info
torpk version
is a good improvement. Therpk version
command is faster, more reliable, and doesn't require cluster connectivity, making the healthcheck more robust during startup.
59-62
: Enhanced service dependency management.The explicit
depends_on
conditions withservice_started
,service_healthy
, andservice_completed_successfully
ensure proper startup sequencing. This is particularly important for the metrics extraction workflow where stable cluster initialization is critical.Also applies to: 93-98, 129-134
402-407
: Modernized MinIO client configuration.The update from
mc config host add
tomc alias set
aligns with current MinIO client best practices. The explicit exit with status 0 provides cleaner completion signaling compared to the previous approach of tailing/dev/null
.cli-utils/generate-cluster-docs.sh (5)
5-8
: Excellent logging enhancement for better user experience.The
log_step
function with timestamps significantly improves the user experience by providing clear progress indicators and timing information. This is particularly valuable for long-running operations like cluster setup.
25-28
: Good pre-flight check addition.Adding the curl dependency check is a smart improvement since the script uses curl for metrics endpoint polling. This provides early failure detection with clear error messaging.
78-92
: Much improved readiness checking with polling.Replacing the fixed 300-second sleep with an intelligent polling loop is a significant improvement. The script now:
- Actively checks endpoint readiness instead of blindly waiting
- Provides faster execution when the endpoint becomes ready early
- Includes proper timeout handling with clear error messaging
- Gives users better feedback about the waiting process
103-103
: Correct path update for new metrics extractor.The path change from
metrics/requirements.txt
tometrics-extractor/requirements.txt
properly aligns with the new metrics extraction tooling structure introduced in this PR.
113-116
: Updated command reflects new metrics extraction workflow.The command update from the old
metrics.py
script to the newmetrics_extractor.py
with explicit JSON and separate internal/external AsciiDoc outputs correctly implements the new source-based metrics documentation generation approach described in the PR objectives.tools/metrics-extractor/README.adoc (1)
1-73
: Well-documented tool with clear usage instructionsThe README provides comprehensive documentation for the metrics extractor tool, including quick start guides, technical details, prerequisites, and development commands. The structure and content are excellent.
tools/metrics-extractor/validate.py (1)
58-169
: Well-structured validation scriptThe validation functions and main orchestration logic are well-implemented. The script provides clear feedback at each step and handles errors appropriately.
bin/doc-tools.js (3)
267-286
: Dependencies check looks goodThe new
verifyMetricsExtractorDependencies
function properly checks for all required dependencies including make, Python 3.10+, Git, and a C++ compiler.
418-431
: Good improvements to cluster docs executionThe enhanced logging, timing, and error reporting provide much better user feedback during the cluster documentation generation process.
505-551
: Legacy command properly renamed with enhanced loggingGood job renaming the legacy command and adding enhanced logging for better user feedback. This appropriately separates the legacy Docker-based approach from the new source-based extraction.
tools/metrics-extractor/metrics_bag.py (1)
9-211
: Well-designed metrics container classThe
MetricsBag
class is well-implemented with comprehensive functionality for storing, merging, and managing metrics. The deterministic ID generation, proper merge logic, and support for multiple output formats make this a solid foundation for the metrics extraction tooling.tools/metrics-extractor/Makefile (1)
1-155
: Well-structured Makefile with comprehensive automationThe Makefile provides excellent automation for the metrics extraction workflow with proper dependency management, error handling, and development convenience targets. The pinned tree-sitter version (v0.20.5) ensures consistent parsing behavior.
tools/metrics-extractor/metrics_extractor.py (5)
28-47
: Well-structured file discovery implementation.The function properly handles both individual files and directories, with appropriate C++ file extensions and recursive search support.
49-59
: Correct Tree-sitter initialization.The function properly initializes the C++ parser and handles library building when needed.
61-107
: Well-designed command-line interface.The argument parser provides a comprehensive set of options with clear help messages and sensible defaults. Good backward compatibility handling for the deprecated
--asciidoc
option.
109-156
: Excellent data normalization functions.Both
clean_description
andclean_labels
implement thoughtful normalization:
- Ensuring descriptions end with proper punctuation
- Smart handling of braced vs. unbraced labels
- Proper deduplication and sorting
374-463
: Well-orchestrated main function with good user feedback.The main function properly handles:
- Logging configuration based on verbosity
- Path validation and parser initialization
- Clear error messages for missing dependencies
- Informative output with emoji indicators
- Comprehensive metrics breakdown
The only minor issue is the unnecessary f-string on line 455.
tools/metrics/compare_metrics.py (1)
30-152
: Well-implemented AsciiDoc parser with robust metric extraction.The parser correctly handles:
- Multiple heading levels (== and ===)
- Various metric properties (type, labels, usage, etc.)
- Error recovery with warning messages
One minor suggestion: Consider using instance variables instead of modifying
locals()
in_process_section
for better code clarity.tools/metrics-extractor/metrics_parser.py (5)
196-317
: Robust variable resolution with appropriate fallbacks.The function implements a comprehensive strategy for resolving group names:
- Local scope search
- Broader scope search
- Function scope search
- Forward declaration search
- File-wide search
- Emergency fallbacks
This multi-layered approach ensures metrics are rarely lost due to unresolved variables.
761-792
: Well-implemented metric name construction.The function properly:
- Sanitizes group names by replacing special characters
- Applies correct prefixes based on metric type (vectorized_ for internal, redpanda_ for external)
- Handles edge cases with unknown group names
794-905
: Good handling of Seastar-specific metric patterns.Both functions properly extract Seastar metrics with appropriate:
- Pattern matching for different Seastar APIs
- Recursive AST traversal
- Consistent metric metadata extraction
907-1016
: Excellent orchestration with critical safeguards.The function demonstrates:
- Proper sequencing of different parsing strategies (Seastar then Prometheus)
- Critical null-check safeguard for group names with multiple fallback strategies
- Comprehensive error handling
- Good logging for debugging
The emergency fallback logic (lines 981-998) is particularly well-designed to ensure metrics are never lost due to unresolved group names.
1018-1113
: Sophisticated description extraction with C++ concatenation support.The function excellently handles:
- Multi-part descriptions from C++ automatic string concatenation
- Proper filtering of comments and whitespace between strings
- Position-based string literal tracking
- Filtering of incomplete format strings
automation | ||
.command('source-metrics-docs') | ||
.description('Generate metrics documentation from Redpanda source code using tree-sitter') | ||
.option('--tag <tag>', 'Git tag or branch to extract from', 'dev') | ||
.option('--diff <oldTag>', 'Also diff autogenerated metrics from <oldTag> → <tag>') | ||
.action((options) => { | ||
verifyMetricsExtractorDependencies(); | ||
|
||
const newTag = options.tag; | ||
const oldTag = options.diff; | ||
const cwd = path.resolve(__dirname, '../tools/metrics-extractor'); | ||
const make = (tag) => { | ||
console.log(`⏳ Building source-based metrics docs for ${tag}…`); | ||
const r = spawnSync('make', ['build', `TAG=${tag}`], { cwd, stdio: 'inherit' }); | ||
if (r.error) { | ||
console.error(`❌ ${r.error.message}`); | ||
process.exit(1); | ||
} | ||
if (r.status !== 0) process.exit(r.status); | ||
}; | ||
|
||
if (oldTag) { | ||
const oldDir = path.join('autogenerated', oldTag, 'source-metrics'); | ||
if (!fs.existsSync(oldDir)) make(oldTag); | ||
} | ||
|
||
make(newTag); | ||
|
||
if (oldTag) { | ||
diffDirs('source-metrics', oldTag, newTag); | ||
} | ||
|
||
process.exit(0); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Well-implemented command following established patterns
This source-metrics-docs
command properly addresses all previous review feedback by using make build
and supporting the same options as property-docs
.
However, having both metrics-docs
and source-metrics-docs
commands may confuse users. Consider removing the metrics-docs
command (lines 461-503) and renaming this to just metrics-docs
as suggested in the previous review.
🤖 Prompt for AI Agents
In bin/doc-tools.js between lines 461 and 503, remove the existing
'metrics-docs' command to avoid user confusion. Then rename the
'source-metrics-docs' command defined around lines 795 to 828 to 'metrics-docs'
to unify the command naming. Ensure all references and descriptions are updated
accordingly to reflect this change.
test: ["CMD", "rpk", "version"] | ||
interval: 10s | ||
timeout: 15s | ||
retries: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Health-check no longer validates broker health
rpk version
succeeds even when Redpanda isn’t running; the command only prints the CLI binary’s version and never contacts the broker. The previous rpk cluster info …
actually hit the admin API.
This makes depends_on: condition: service_healthy
meaningless and may cause race conditions for services relying on a ready broker.
- test: ["CMD", "rpk", "version"]
+ # Probe the broker over the Admin API; keeps zero external creds.
+ test: ["CMD", "rpk", "cluster", "info", "-X", "brokers=localhost:9644"]
🤖 Prompt for AI Agents
In docker-compose/docker-compose.yml around lines 54 to 57, the health-check
uses `rpk version`, which only checks the CLI version and does not verify broker
health. Replace the health-check command with one that queries the broker's
admin API, such as `rpk cluster info`, to ensure the health-check accurately
reflects the broker's readiness and prevents race conditions for dependent
services.
$(PYTHON) metrics_extractor.py \ | ||
--recursive \ | ||
--output $(OUTPUT_DIR)/metrics.json \ | ||
--internal-asciidoc $(OUTPUT_DIR)/internal_metrics_reference.adoc \ | ||
--external-asciidoc $(OUTPUT_DIR)/public_metrics_reference.adoc \ | ||
--filter-namespace redpanda \ | ||
--verbose \ | ||
$(REDPANDA_PATH)/src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect argument name in extract-local target
The extract-local
target uses --output
but should use --json-output
to match the expected argument name in metrics_extractor.py
.
$(PYTHON) metrics_extractor.py \
--redpanda-repo $(REDPANDA_PATH) \
- --output $(OUTPUT_DIR)/metrics.json \
+ --json-output $(OUTPUT_DIR)/metrics.json \
--internal-asciidoc $(OUTPUT_DIR)/internal_metrics_reference.adoc \
--external-asciidoc $(OUTPUT_DIR)/public_metrics_reference.adoc \
- --filter-namespace redpanda \
--verbose
- $(REDPANDA_PATH)/src
Also note that the command seems to have extra arguments that may not be supported:
--filter-namespace redpanda
$(REDPANDA_PATH)/src
as a positional argument
Please verify these arguments are supported by the script.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In tools/metrics-extractor/Makefile around lines 147 to 154, the extract-local
target uses the argument --output which should be changed to --json-output to
match the expected argument in metrics_extractor.py. Additionally, verify if the
arguments --filter-namespace redpanda and the positional argument
$(REDPANDA_PATH)/src are supported by the script; if not, remove or correct them
accordingly to ensure compatibility with metrics_extractor.py.
@@ -0,0 +1,211 @@ | |||
import logging | |||
import hashlib | |||
import uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import
The uuid
module is imported but never used in this file.
import logging
import hashlib
-import uuid
from collections import defaultdict
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import uuid | |
import logging | |
import hashlib | |
from collections import defaultdict |
🧰 Tools
🪛 Ruff (0.12.2)
3-3: uuid
imported but unused
Remove unused import: uuid
(F401)
🤖 Prompt for AI Agents
In tools/metrics-extractor/metrics_bag.py at line 3, the uuid module is imported
but not used anywhere in the file. Remove the import statement for uuid to clean
up unused imports and improve code clarity.
@@ -0,0 +1,1127 @@ | |||
import os | |||
import re | |||
import subprocess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
subprocess
is imported but not used in the code.
-import subprocess
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import subprocess |
🧰 Tools
🪛 Ruff (0.12.2)
3-3: subprocess
imported but unused
Remove unused import: subprocess
(F401)
🤖 Prompt for AI Agents
In tools/metrics-extractor/metrics_parser.py at line 3, the subprocess module is
imported but never used. Remove the import statement for subprocess to clean up
the code and avoid unnecessary imports.
""" | ||
import os | ||
import sys | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import
The json
module is imported but never used in this file.
-import json
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import json |
🧰 Tools
🪛 Ruff (0.12.2)
7-7: json
imported but unused
Remove unused import: json
(F401)
🤖 Prompt for AI Agents
In tools/metrics-extractor/validate.py at line 7, the json module is imported
but not used anywhere in the file. Remove the import statement for json to clean
up the code and avoid unnecessary imports.
print(f" ❌ Example script failed:") | ||
print(f" stdout: {result.stdout}") | ||
print(f" stderr: {result.stderr}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary f-string prefix
The f-string prefix is not needed as there are no placeholders in the string.
- print(f" ❌ Example script failed:")
+ print(" ❌ Example script failed:")
print(f" stdout: {result.stdout}")
print(f" stderr: {result.stderr}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
print(f" ❌ Example script failed:") | |
print(f" stdout: {result.stdout}") | |
print(f" stderr: {result.stderr}") | |
print(" ❌ Example script failed:") | |
print(f" stdout: {result.stdout}") | |
print(f" stderr: {result.stderr}") |
🧰 Tools
🪛 Ruff (0.12.2)
49-49: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🤖 Prompt for AI Agents
In tools/metrics-extractor/validate.py around lines 49 to 51, the print
statements use f-string prefixes without any placeholders. Remove the
unnecessary f-string prefix from these print statements to simplify the code.
|
||
import re | ||
import argparse | ||
from typing import Dict, List, Tuple, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
Tuple
and Optional
are imported but not used in the code.
-from typing import Dict, List, Tuple, Optional
+from typing import Dict, List
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from typing import Dict, List, Tuple, Optional | |
-from typing import Dict, List, Tuple, Optional | |
+from typing import Dict, List |
🧰 Tools
🪛 Ruff (0.12.2)
12-12: typing.Tuple
imported but unused
Remove unused import
(F401)
12-12: typing.Optional
imported but unused
Remove unused import
(F401)
🤖 Prompt for AI Agents
In tools/metrics/compare_metrics.py at line 12, the imports Tuple and Optional
are not used anywhere in the code. Remove Tuple and Optional from the import
statement to clean up unused imports, leaving only Dict and List.
differences.append(f"Labels differ") | ||
|
||
if metric1.usage != metric2.usage: | ||
differences.append(f"Usage differs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary f-string prefixes.
These strings don't contain any placeholders.
- differences.append(f"Labels differ")
+ differences.append("Labels differ")
if metric1.usage != metric2.usage:
- differences.append(f"Usage differs")
+ differences.append("Usage differs")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
differences.append(f"Labels differ") | |
if metric1.usage != metric2.usage: | |
differences.append(f"Usage differs") | |
differences.append("Labels differ") | |
if metric1.usage != metric2.usage: | |
differences.append("Usage differs") |
🧰 Tools
🪛 Ruff (0.12.2)
231-231: f-string without any placeholders
Remove extraneous f
prefix
(F541)
234-234: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🤖 Prompt for AI Agents
In tools/metrics/compare_metrics.py around lines 231 to 234, remove the
unnecessary f-string prefixes from the strings "Labels differ" and "Usage
differs" since they do not contain any placeholders. Replace the f-strings with
regular string literals to simplify the code.
print(f"METRICS COMPARISON REPORT") | ||
print(f"{'='*60}") | ||
print(f"File 1 ({file1_name}): {results['total_file1']} metrics") | ||
print(f"File 2 ({file2_name}): {results['total_file2']} metrics") | ||
print(f"Common metrics: {results['common_count']}") | ||
|
||
# Heading level differences | ||
if results['heading_level_differences']: | ||
print(f"\n📏 HEADING LEVEL DIFFERENCES:") | ||
print(f" Count: {len(results['heading_level_differences'])}") | ||
for item in results['heading_level_differences']: | ||
print(f" - {item['name']}: {item['file1_level']} vs {item['file2_level']}") | ||
|
||
# Metrics only in file 1 (should be removed) | ||
if results['file1_unique']: | ||
print(f"\n🗑️ METRICS TO REMOVE (only in {file1_name}):") | ||
print(f" Count: {len(results['file1_unique'])}") | ||
for metric in results['file1_unique']: | ||
print(f" - {metric}") | ||
|
||
# Metrics only in file 2 (missing from file 1) | ||
if results['file2_unique']: | ||
print(f"\n📝 METRICS MISSING FROM {file1_name}:") | ||
print(f" Count: {len(results['file2_unique'])}") | ||
for metric in results['file2_unique']: | ||
print(f" - {metric}") | ||
|
||
# Description improvements | ||
if results['improved_descriptions']: | ||
print(f"\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:") | ||
print(f" Count: {len(results['improved_descriptions'])}") | ||
|
||
for item in results['improved_descriptions']: | ||
print(f"\n 📊 {item['name']}:") | ||
print(f" Similarity: {item['similarity']:.2f}") | ||
|
||
if item['likely_improvement']: | ||
print(f" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)") | ||
|
||
print(f" File 1: {item['file1_desc'][:100]}{'...' if len(item['file1_desc']) > 100 else ''}") | ||
print(f" File 2: {item['file2_desc'][:100]}{'...' if len(item['file2_desc']) > 100 else ''}") | ||
|
||
# Other property differences | ||
if results['different_properties']: | ||
print(f"\n🔧 OTHER PROPERTY DIFFERENCES:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary f-string prefixes throughout the function.
Multiple strings don't contain placeholders and don't need the f-prefix.
- print(f"METRICS COMPARISON REPORT")
+ print("METRICS COMPARISON REPORT")
- print(f"\n📏 HEADING LEVEL DIFFERENCES:")
+ print("\n📏 HEADING LEVEL DIFFERENCES:")
- print(f"\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:")
+ print("\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:")
- print(f" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)")
+ print(" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)")
- print(f"\n🔧 OTHER PROPERTY DIFFERENCES:")
+ print("\n🔧 OTHER PROPERTY DIFFERENCES:")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
print(f"METRICS COMPARISON REPORT") | |
print(f"{'='*60}") | |
print(f"File 1 ({file1_name}): {results['total_file1']} metrics") | |
print(f"File 2 ({file2_name}): {results['total_file2']} metrics") | |
print(f"Common metrics: {results['common_count']}") | |
# Heading level differences | |
if results['heading_level_differences']: | |
print(f"\n📏 HEADING LEVEL DIFFERENCES:") | |
print(f" Count: {len(results['heading_level_differences'])}") | |
for item in results['heading_level_differences']: | |
print(f" - {item['name']}: {item['file1_level']} vs {item['file2_level']}") | |
# Metrics only in file 1 (should be removed) | |
if results['file1_unique']: | |
print(f"\n🗑️ METRICS TO REMOVE (only in {file1_name}):") | |
print(f" Count: {len(results['file1_unique'])}") | |
for metric in results['file1_unique']: | |
print(f" - {metric}") | |
# Metrics only in file 2 (missing from file 1) | |
if results['file2_unique']: | |
print(f"\n📝 METRICS MISSING FROM {file1_name}:") | |
print(f" Count: {len(results['file2_unique'])}") | |
for metric in results['file2_unique']: | |
print(f" - {metric}") | |
# Description improvements | |
if results['improved_descriptions']: | |
print(f"\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:") | |
print(f" Count: {len(results['improved_descriptions'])}") | |
for item in results['improved_descriptions']: | |
print(f"\n 📊 {item['name']}:") | |
print(f" Similarity: {item['similarity']:.2f}") | |
if item['likely_improvement']: | |
print(f" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)") | |
print(f" File 1: {item['file1_desc'][:100]}{'...' if len(item['file1_desc']) > 100 else ''}") | |
print(f" File 2: {item['file2_desc'][:100]}{'...' if len(item['file2_desc']) > 100 else ''}") | |
# Other property differences | |
if results['different_properties']: | |
print(f"\n🔧 OTHER PROPERTY DIFFERENCES:") | |
print("METRICS COMPARISON REPORT") | |
print(f"{'='*60}") | |
print(f"File 1 ({file1_name}): {results['total_file1']} metrics") | |
print(f"File 2 ({file2_name}): {results['total_file2']} metrics") | |
print(f"Common metrics: {results['common_count']}") | |
# Heading level differences | |
if results['heading_level_differences']: | |
print("\n📏 HEADING LEVEL DIFFERENCES:") | |
print(f" Count: {len(results['heading_level_differences'])}") | |
for item in results['heading_level_differences']: | |
print(f" - {item['name']}: {item['file1_level']} vs {item['file2_level']}") | |
# Metrics only in file 1 (should be removed) | |
if results['file1_unique']: | |
print(f"\n🗑️ METRICS TO REMOVE (only in {file1_name}):") | |
print(f" Count: {len(results['file1_unique'])}") | |
for metric in results['file1_unique']: | |
print(f" - {metric}") | |
# Metrics only in file 2 (missing from file 1) | |
if results['file2_unique']: | |
print(f"\n📝 METRICS MISSING FROM {file1_name}:") | |
print(f" Count: {len(results['file2_unique'])}") | |
for metric in results['file2_unique']: | |
print(f" - {metric}") | |
# Description improvements | |
if results['improved_descriptions']: | |
print("\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:") | |
print(f" Count: {len(results['improved_descriptions'])}") | |
for item in results['improved_descriptions']: | |
print(f"\n 📊 {item['name']}:") | |
print(f" Similarity: {item['similarity']:.2f}") | |
if item['likely_improvement']: | |
print(" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)") | |
print(f" File 1: {item['file1_desc'][:100]}{'...' if len(item['file1_desc']) > 100 else ''}") | |
print(f" File 2: {item['file2_desc'][:100]}{'...' if len(item['file2_desc']) > 100 else ''}") | |
# Other property differences | |
if results['different_properties']: | |
print("\n🔧 OTHER PROPERTY DIFFERENCES:") |
🧰 Tools
🪛 Ruff (0.12.2)
243-243: f-string without any placeholders
Remove extraneous f
prefix
(F541)
251-251: f-string without any placeholders
Remove extraneous f
prefix
(F541)
272-272: f-string without any placeholders
Remove extraneous f
prefix
(F541)
280-280: f-string without any placeholders
Remove extraneous f
prefix
(F541)
287-287: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🤖 Prompt for AI Agents
In tools/metrics/compare_metrics.py from lines 243 to 287, many print statements
use f-string prefixes unnecessarily where no variable interpolation occurs.
Remove the f-prefix from all string literals that do not contain placeholders to
simplify the code and improve readability.
During testing I identified several false-positives from the generation. Most of the problems were because the parsing was failing to get group_names. Some are EXTREMELY hard to identify via this method, even using the AST. One in particular I'm considering impossible, because it spawns multiple files. I'm trying to reduce the number of false-positives, and the latest commit introduces an aggressive check to resolve the group_names. There's still several print debugs in the code because of this. So the automation must not be merged yet. |
Just noticed that some labels are also missclassified. In these two metrics, the autogenerated labels look like this. === redpanda_trust_file_crc32c
CRC32C checksum calculated from the contents of the trust file. This value is calculated when a valid certificate is loaded and a trust store is present. Otherwise, the value is zero.
*Type*: gauge
*Labels*:
- `aggregates`
- `label`
- `shard`
---
=== redpanda_truststore_expires_at_timestamp_seconds
Expiry time of the shortest-lived CA in the truststore(seconds since epoch).
*Type*: gauge
*Labels*:
- `aggregates`
- `label`
- `label_instance`
--- On the source code, available here https://github.com/redpanda-data/redpanda/blob/dev/src/v/net/probes.cc we can tell that it's actually |
New metrics automation
Introducing a new metrics automation tool that traverses through the Repdanda source code to create reliable automation.
Run it:
Changes
Moved the old metrics command to
metric-docs-legacy
Json output schema
more info
More info at the provided readmes