New metrics automation #118

paulohtb6 · 2025-07-21T16:15:08Z

New metrics automation

Introducing a new metrics automation tool that traverses through the Repdanda source code to create reliable automation.

Run it:

npx generate metrics-docs --tag [tag-value. Can be dev or v25.2.4-rc4 for example]

Changes

Moved the old metrics command to metric-docs-legacy

Json output schema

{
  "redpanda_kafka_requests_total": {
    "unique_id": "a1b2c3d4e5f6...",
    "name": "redpanda_kafka_requests_total",
    "full_name": "redpanda_kafka_requests_total",
    "metric_type": "external",
    "type": "counter",
    "description": "Total number of Kafka requests.",
    "labels": ["request_type", "status"],
    "constructor": "make_counter",
    "files": [{"file": "src/v/kafka/server/handlers/handler.cc", "line": 45}],
    "group": "kafka"
  }
}

more info

✅ Deploy Preview for docs-extensions-and-macros ready!

Name	Link
🔨 Latest commit	`9c382b1`
🔍 Latest deploy log	https://app.netlify.com/projects/docs-extensions-and-macros/deploys/688295e0da3eb5000844ca2c
😎 Deploy Preview	https://deploy-preview-118--docs-extensions-and-macros.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-07-21T16:15:15Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This change introduces a new, source-based workflow for extracting and generating Redpanda metrics documentation. It adds a Python-based metrics extractor tool using Tree-sitter to parse C++ source files, a Makefile to automate environment setup and extraction, and CLI integration via new commands in bin/doc-tools.js. The workflow supports generating metrics docs from a specific Redpanda tag/branch or local source, outputs both AsciiDoc and JSON files, and includes improved logging, dependency checks, and readiness polling. Supporting scripts, documentation, and tests are also added or updated. Legacy Docker-based extraction is retained with enhanced logging. Additional improvements include refined Docker Compose service dependencies and healthchecks, updated CLI scripts for better logging and readiness checks, and a new metrics comparison tool for AsciiDoc files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DocToolsCLI
    participant Makefile
    participant MetricsExtractor (Python)
    participant TreeSitter
    participant RedpandaRepo

    User->>DocToolsCLI: Run "generate metrics-docs" or "source-metrics-docs"
    DocToolsCLI->>Makefile: Invoke with tag/branch or local path
    Makefile->>RedpandaRepo: Clone or use local source
    Makefile->>TreeSitter: Build C++ grammar
    Makefile->>MetricsExtractor: Run metrics_extractor.py
    MetricsExtractor->>TreeSitter: Parse C++ files for metrics
    MetricsExtractor->>RedpandaRepo: Read C++ source files
    MetricsExtractor-->>Makefile: Output JSON and AsciiDoc files
    Makefile-->>DocToolsCLI: Return generated documentation paths
    DocToolsCLI-->>User: Print completion and output file locations

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

Possibly related PRs

DOC-1275 Add a doc-tools CLI command so that writers and our tests have access to a consistent set of tools #99: Adds initial doc-tools CLI commands for metrics and property documentation generation, which is directly extended and refined by this PR's new source-based metrics extraction workflow.

Suggested reviewers

kbatuigas
Feediver1

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch metrics-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

JakeSCahill · 2025-07-23T08:43:12Z

@coderabbitat review

coderabbitai · 2025-07-23T08:48:26Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@JakeSCahill I'll conduct a comprehensive review of this substantial PR that introduces a new source-based metrics extraction workflow. Let me analyze the key components systematically.

Based on my comprehensive analysis, here's my detailed review:

🔍 Overall Assessment

This is an ambitious and well-structured PR that introduces a modern, source-based approach to metrics documentation generation. The implementation demonstrates solid software engineering practices with proper separation of concerns, comprehensive tooling, and good documentation.

✅ Strengths

1. Architecture & Design

Clean separation of concerns: Parser, bag container, extractor, and CLI layers are well-defined
Flexible output formats: Supports both JSON and AsciiDoc with internal/external classification
Comprehensive tooling: Makefile automation, CLI integration, and validation scripts
Tree-sitter integration: Modern AST-based parsing is more reliable than regex-based approaches

2. Code Quality

Robust error handling: Good use of try-catch blocks and logging
Input validation: Path validation, dependency checks, and sanitization functions
Documentation: Well-documented modules with clear docstrings
Testing: Includes test infrastructure and validation scripts

3. User Experience

Multiple workflows: Supports GitHub tags, local repos, and legacy Docker method
Progress feedback: Enhanced logging with timestamps and status updates
Dependency management: Automated dependency verification and setup

⚠️ Areas for Improvement

1. Security Considerations

# In metrics_parser.py - potential security risk
def get_file_contents(path):
    with open(path, 'r') as f:  # No encoding specified
        return f.read()

Recommendation: Specify encoding explicitly and add file size limits:

def get_file_contents(path, max_size=10*1024*1024):  # 10MB limit
    if os.path.getsize(path) > max_size:
        raise ValueError(f"File {path} too large")
    with open(path, 'r', encoding='utf-8') as f:
        return f.read()

2. Resource Management

The Makefile performs git clones and builds without cleanup on failure. Consider adding:

.ONESHELL:
clean-on-error:
	@if [ -d "$(REDPANDA_REPO_DIR)" ]; then rm -rf "$(REDPANDA_REPO_DIR)"; fi

3. Error Handling Enhancement

In metrics_extractor.py, some error paths don't provide enough context:

# Current
if not os.path.exists(options.redpanda_source_path):
    print(f"Error: Path {options.redpanda_source_path} does not exist")
    sys.exit(1)

# Suggested improvement
if not os.path.exists(options.redpanda_source_path):
    print(f"Error: Redpanda source path '{options.redpanda_source_path}' does not exist.")
    print("Please ensure the path is correct and accessible.")
    print("For remote repos, use: git clone <repo_url> <local_path>")
    sys.exit(1)

4. Performance Optimization

The metrics extraction processes all C++ files without parallelization. For large codebases, consider:

from concurrent.futures import ProcessPoolExecutor
import multiprocessing

def extract_metrics_parallel(cpp_files, parser, language, filter_namespace=None):
    num_processes = min(multiprocessing.cpu_count(), len(cpp_files))
    with ProcessPoolExecutor(max_workers=num_processes) as executor:
        # Implementation details...

5. Configuration Management

Hard-coded values should be configurable:

# In metrics_extractor.py
DEFAULT_TREESITTER_VERSION = "v0.20.5"  # Should be configurable
MAX_FILE_SIZE = 10 * 1024 * 1024  # Should be in config

📋 Specific Code Issues

1. Tree-sitter Version Pinning

# In Makefile - version should be configurable
TREESITTER_VERSION ?= v0.20.5

2. Memory Usage

The MetricsBag class stores all metrics in memory. For very large codebases, consider implementing a streaming approach or pagination.

3. Regex Complexity

In metrics_parser.py, some regex patterns are complex and could benefit from compilation:

# Compile regex patterns at module level
LABEL_PATTERN = re.compile(r'["\']([^"\']+)["\']')

🧪 Testing Recommendations

Add integration tests for the complete workflow
Test error conditions (missing dependencies, malformed C++ files)
Performance benchmarks for large codebases
Cross-platform testing (Windows/macOS compatibility)

📚 Documentation Enhancements

The documentation is comprehensive, but consider adding:

Troubleshooting section for common issues
Performance tuning guide for large repositories
Contributing guidelines for extending the extractor

🎯 Final Verdict

This is a high-quality PR that significantly improves the metrics documentation workflow. The code demonstrates strong engineering practices, and the architecture is well-designed for maintainability and extensibility.

Recommendation: Approve with minor revisions focusing on the security and error handling improvements mentioned above.

🚀 Next Steps

Address the security considerations around file handling
Add resource cleanup for failed operations
Consider performance optimizations for large codebases
Enhance error messages with actionable guidance

The new source-based approach is a significant improvement over the Docker-based legacy method, providing better reliability, performance, and maintainability. Excellent work! 🎉

🏁 Script executed:

#!/bin/bash
# First, let's get an overview of the changes and file structure
echo "=== PR Overview ==="
echo "Files changed: $(git diff --name-only HEAD~1 | wc -l)"
echo "Lines added/removed:"
git diff --stat HEAD~1

echo -e "\n=== New Python modules structure ==="
find tools/metrics-extractor -name "*.py" | head -10

echo -e "\n=== Key configuration files ==="
ls -la tools/metrics-extractor/Makefile 2>/dev/null || echo "Makefile not found"
ls -la tools/metrics-extractor/requirements.txt 2>/dev/null || echo "requirements.txt not found"

echo -e "\n=== CLI integration ==="
grep -n "metrics-docs" bin/doc-tools.js | head -5

Length of output: 1497

🏁 Script executed:

#!/bin/bash
# Analyze the main Python modules for code quality and structure
echo "=== Analyzing metrics_extractor.py ==="
python -c "
import ast
with open('tools/metrics-extractor/metrics_extractor.py', 'r') as f:
    tree = ast.parse(f.read())
    
functions = [node.name for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)]
print(f'Functions defined: {len(functions)}')
print('Function names:', functions[:10])

classes = [node.name for node in ast.walk(tree) if isinstance(node, ast.ClassDef)]
print(f'Classes defined: {len(classes)}')
if classes:
    print('Class names:', classes)
"

echo -e "\n=== Analyzing metrics_parser.py ==="
python -c "
import ast
with open('tools/metrics-extractor/metrics_parser.py', 'r') as f:
    tree = ast.parse(f.read())
    
functions = [node.name for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)]
print(f'Functions defined: {len(functions)}')
print('Function names:', functions[:10])
"

echo -e "\n=== Checking error handling patterns ==="
grep -n "except\|try\|raise\|logger\|logging" tools/metrics-extractor/metrics_extractor.py | head -10

Length of output: 2430

🏁 Script executed:

#!/bin/bash
# Check the CLI integration and dependency management
echo "=== CLI Commands Added ==="
grep -A 5 -B 2 "metrics-docs" bin/doc-tools.js

echo -e "\n=== Dependency verification function ==="
grep -A 10 "verifyMetricsExtractorDependencies" bin/doc-tools.js

echo -e "\n=== Package.json scripts ==="
grep -A 5 -B 2 "metrics" package.json

echo -e "\n=== Requirements analysis ==="
cat tools/metrics-extractor/requirements.txt

Length of output: 3437

🏁 Script executed:

#!/bin/bash
# Analyze the Makefile structure and targets
echo "=== Makefile targets ==="
grep "^[a-zA-Z].*:" tools/metrics-extractor/Makefile | head -15

echo -e "\n=== Environment variables and configuration ==="
grep "^[A-Z_]*=" tools/metrics-extractor/Makefile | head -10

echo -e "\n=== Testing infrastructure ==="
find tools/metrics-extractor -name "*test*" -type f
ls -la tools/metrics-extractor/tests/ 2>/dev/null || echo "No tests directory found"

Length of output: 1171

🏁 Script executed:

#!/bin/bash
# Check for security and best practices issues
echo "=== Security analysis ==="
echo "Checking for subprocess calls without shell=False:"
grep -rn "subprocess\|os\.system\|shell=True" tools/metrics-extractor/ | head -10

echo -e "\n=== File permissions and executable checks ==="
find tools/metrics-extractor -name "*.py" -executable | head -5

echo -e "\n=== Hardcoded paths or secrets ==="
grep -rn "localhost\|127\.0\.0\.1\|password\|secret\|token" tools/metrics-extractor/ | head -5

echo -e "\n=== Input validation patterns ==="
grep -rn "validate\|sanitize\|clean" tools/metrics-extractor/ | head -10

Length of output: 1830

🏁 Script executed:

#!/bin/bash
# Analyze documentation and usability
echo "=== Documentation files ==="
find . -name "README*" -newer tools/metrics-extractor/README.adoc 2>/dev/null || find . -name "README*"

echo -e "\n=== Help text and usage ==="
grep -rn "help\|usage\|--help" bin/doc-tools.js | grep -i metrics

echo -e "\n=== Error messages and user feedback ==="
grep -rn "console\.log\|console\.error\|print" tools/metrics-extractor/ | head -10

Length of output: 1515

Based on my comprehensive analysis of this substantial PR, here's my detailed review:

🔍 Overall Assessment

This is an excellent, well-architected PR that introduces a modern source-based metrics extraction workflow. The implementation demonstrates strong software engineering practices with proper separation of concerns, comprehensive tooling, and thoughtful user experience design.

✅ Major Strengths

1. Architecture Excellence

Clean modular design: MetricsBag (container) → metrics_parser (AST analysis) → metrics_extractor (orchestration)
Tree-sitter integration: Modern AST-based parsing is far superior to regex approaches
Flexible output: JSON + AsciiDoc with internal/external classification
Multi-workflow support: GitHub tags, local repos, and legacy Docker fallback

2. Robust Implementation

Proper dependency management: verifyMetricsExtractorDependencies() checks for make, Python 3.10+, Git, and C++ compiler
Good error handling: Comprehensive try-catch blocks and informative logging
Safe subprocess usage: No shell=True usage found - security conscious
User-friendly feedback: Progress indicators, timestamps, and clear success/error messages

3. Comprehensive Tooling

Makefile automation: 15+ targets covering build, test, clean, and development workflows
CLI integration: 3 new commands seamlessly integrated into existing doc-tools
Validation scripts: End-to-end validation and dependency checking

⚠️ Areas for Improvement

1. Enhanced Error Handling & File Operations

Current issue in metrics_parser.py:

def get_file_contents(path):
    with open(path, 'r') as f:  # No encoding, no size limits
        return f.read()

Recommended improvement:

def get_file_contents(path, max_size=10*1024*1024, encoding='utf-8'):
    """Read file contents with safety checks."""
    if not os.path.exists(path):
        raise FileNotFoundError(f"File not found: {path}")
    
    file_size = os.path.getsize(path)
    if file_size > max_size:
        raise ValueError(f"File too large ({file_size} bytes): {path}")
    
    try:
        with open(path, 'r', encoding=encoding, errors='replace') as f:
            return f.read()
    except UnicodeDecodeError as e:
        raise ValueError(f"Cannot decode file {path}: {e}")

2. Performance Optimization for Large Codebases

The current implementation processes files sequentially. For large repositories like Redpanda, consider parallel processing:

from concurrent.futures import ThreadPoolExecutor
import multiprocessing

def extract_metrics_from_files_parallel(cpp_files, parser, language, filter_namespace=None):
    """Extract metrics using parallel processing."""
    max_workers = min(multiprocessing.cpu_count(), len(cpp_files), 8)  # Cap at 8
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(parse_cpp_file, file_path, parser, language, filter_namespace)
            for file_path in cpp_files
        ]
        
        results = MetricsBag()
        for future in concurrent.futures.as_completed(futures):
            try:
                file_results = future.result()
                results.merge(file_results)
            except Exception as e:
                logger.warning(f"Failed to process file: {e}")
        
        return results

3. Memory Management for Large Datasets

The MetricsBag class stores everything in memory. For very large codebases, consider streaming:

class MetricsBag:
    def __init__(self, max_memory_items=50000):
        self._metrics = {}
        self._max_memory_items = max_memory_items
        self._temp_file = None
    
    def add_metric(self, *args, **kwargs):
        if len(self._metrics) > self._max_memory_items:
            self._spill_to_disk()
        # ... existing logic
    
    def _spill_to_disk(self):
        """Spill excess metrics to temporary file."""
        # Implementation for memory management

4. Enhanced Input Validation

Current validation in metrics_extractor.py:

def validate_paths(options):
    path = options.redpanda_source_path
    if not os.path.exists(path):
        logger.error(f'Path does not exist: "{path}".')
        return False
    return True

Enhanced validation:

def validate_paths(options):
    """Comprehensive path validation with helpful error messages."""
    path = options.redpanda_source_path
    
    if not path:
        logger.error("Redpanda source path is required.")
        return False
    
    if not os.path.exists(path):
        logger.error(f'Path does not exist: "{path}"')
        logger.error("Suggestions:")
        logger.error("  • Check the path spelling")
        logger.error("  • For remote repos: git clone <repo_url> <local_path>")
        logger.error("  • Ensure you have read permissions")
        return False
    
    if not os.path.isdir(path):
        logger.error(f'Path is not a directory: "{path}"')
        return False
    
    # Check for key Redpanda files/directories
    expected_paths = ['src/', 'CMakeLists.txt', 'tools/']
    missing_paths = [p for p in expected_paths if not os.path.exists(os.path.join(path, p))]
    
    if missing_paths:
        logger.warning(f"This may not be a Redpanda repository. Missing: {missing_paths}")
    
    return True

5. Expanded Testing Coverage

Currently only one test file exists. Recommend adding:

# tests/test_metrics_bag.py
def test_metrics_bag_operations():
    """Test MetricsBag core functionality."""
    
# tests/test_parser_edge_cases.py  
def test_malformed_cpp_files():
    """Test parser robustness with malformed C++ files."""
    
# tests/test_cli_integration.py
def test_cli_commands():
    """Test CLI command integration."""
    
# tests/test_performance.py
def test_large_codebase_performance():
    """Performance benchmarks for large repositories."""

📊 Code Quality Metrics

✅ Excellent:

No security anti-patterns (shell=True, hardcoded secrets)
Proper logging infrastructure
Good separation of concerns
Comprehensive CLI integration

✅ Good:

Error handling coverage
User feedback and progress indicators
Documentation completeness

⚠️ Needs attention:

Test coverage (only 1 test file for 5 modules)
Memory usage optimization
File encoding handling

🧪 Testing Recommendations

Integration tests for complete workflows
Edge case testing for malformed C++ files
Performance benchmarks with large codebases
Cross-platform compatibility testing
Error condition testing (missing dependencies, network failures)

📚 Documentation Enhancements

The documentation is comprehensive, but consider adding:

# Troubleshooting Guide
## Common Issues
- Tree-sitter compilation failures
- Memory issues with large repositories
- C++ parsing edge cases

# Performance Tuning
- Parallel processing configuration
- Memory management for large codebases
- Filtering strategies for faster processing

🎯 Final Recommendation

APPROVE - This is exemplary work that significantly improves the metrics documentation workflow. The architecture is sound, implementation is robust, and the user experience is well-designed.

🚀 Priority Action Items

High Priority: Add file encoding safety to get_file_contents()
Medium Priority: Expand test coverage with integration tests
Medium Priority: Add parallel processing for large repositories
Low Priority: Enhanced error messages with troubleshooting hints

🌟 Standout Features

Tree-sitter integration - Modern, reliable parsing approach
Dependency verification - Comprehensive pre-flight checks
Multi-output formats - JSON + categorized AsciiDoc
Makefile automation - Production-ready build system
Legacy compatibility - Maintains existing Docker workflow

This PR represents a significant architectural improvement that will make metrics documentation generation more reliable, maintainable, and performant. Excellent engineering work! 🎉

🐰✨
Impressed by the thoughtful architecture and comprehensive implementation!

JakeSCahill · 2025-07-23T08:51:41Z

bin/doc-tools.js

  .command('metrics-docs')
-  .description('Generate JSON and AsciiDoc documentation for Redpanda metrics')
+  .description('Generate JSON and AsciiDoc documentation for Redpanda metrics from source code')
+  .requiredOption('-r, --redpanda-repo <path>', 'Path to the Redpanda repository root directory')


Why is this required? For the property-docs command, we require a tag and use that to clone and check out the repo.

Please give this command the same signature as property-docs. It should support the diff option.

automation .command('property-docs') .description('Generate JSON and AsciiDoc documentation for Redpanda configuration properties') .option('--tag <tag>', 'Git tag or branch to extract from', 'dev') .option('--diff <oldTag>', 'Also diff autogenerated properties from <oldTag> → <tag>') .action((options) => { verifyPropertyDependencies(); const newTag = options.tag; const oldTag = options.diff; const cwd = path.resolve(__dirname, '../tools/property-extractor'); const make = (tag) => { console.log(`⏳ Building property docs for ${tag}…`); const r = spawnSync('make', ['build', `TAG=${tag}`], { cwd, stdio: 'inherit' }); if (r.error) { console.error(`❌ ${r.error.message}`); process.exit(1); } if (r.status !== 0) process.exit(r.status); }; if (oldTag) { const oldDir = path.join('autogenerated', oldTag, 'properties'); if (!fs.existsSync(oldDir)) make(oldTag); } make(newTag); if (oldTag) { diffDirs('properties', oldTag, newTag); } process.exit(0); });

JakeSCahill · 2025-07-23T08:53:05Z

bin/doc-tools.js

+
+    const startTime = Date.now();
+    const result = spawnSync('python3', [
+      path.join(__dirname, '../tools/metrics-extractor/metrics_extractor.py'),


Since you have a Makefile, we should execute make build instead like we do for property-docs.

JakeSCahill · 2025-07-23T08:53:48Z

bin/doc-tools.js

+  .option('--internal-asciidoc <path>', 'Custom path for internal metrics AsciiDoc file', 'autogenerated/internal_metrics_reference.adoc')
+  .option('--external-asciidoc <path>', 'Custom path for external/public metrics AsciiDoc file', 'autogenerated/public_metrics_reference.adoc')
+  .action((options) => {
+    console.log(`🎯 Starting enhanced metrics extraction from source code`);


Suggested change

console.log(`🎯 Starting enhanced metrics extraction from source code`);

JakeSCahill · 2025-07-23T08:55:08Z

bin/doc-tools.js

+  .description('Generate JSON and AsciiDoc documentation for Redpanda metrics from source code')
+  .requiredOption('-r, --redpanda-repo <path>', 'Path to the Redpanda repository root directory')
+  .option('--json-output <path>', 'Custom path for JSON output file', 'autogenerated/metrics.json')
+  .option('--internal-asciidoc <path>', 'Custom path for internal metrics AsciiDoc file', 'autogenerated/internal_metrics_reference.adoc')


I like these new options for specify paths.

We don't typically use underscores in Asciidoc filenames. Please replace with hyphens.

JakeSCahill · 2025-07-23T08:57:55Z

bin/doc-tools.js

  });

+automation
+  .command('source-metrics-docs')


Not sure why this is a separate command. This should be called metrics-docs and should replace the other one in this file.

Please also keep support for specifying paths for the output

JakeSCahill · 2025-07-23T08:59:36Z

tools/metrics-extractor/README.adoc

+
+All commands generate three files in the `autogenerated/` directory:
+
+* `internal_metrics_reference.adoc` - Internal metrics for engineering documentation


Update these docs when you fix the filenames from using underscores to hyphens.

JakeSCahill · 2025-07-23T09:01:06Z

tools/metrics/metrics.py

+    filtered = {}
+    seen_names = set()  # Track metric names to detect duplicates
+
+    for name, data in metrics.items():


JakeSCahill · 2025-07-23T09:06:01Z

tools/metrics-extractor/metrics_extractor.py

+
+            # Create more meaningful category names
+            category_mapping = {
+                'cluster': 'Cluster metrics',


JakeSCahill · 2025-07-23T09:07:19Z

bin/doc-tools.js

+  });
+
+automation
+  .command('metrics-docs-legacy')


I would just remove the legacy implementation.

We're using semver versioning, so we can always install a previous version to try it again.

JakeSCahill · 2025-07-23T09:09:18Z

package.json

    "build": "antora --to-dir docs --fetch local-antora-playbook.yml",
    "serve": "wds --node-resolve --open preview/test/ --watch --root-dir docs",
-    "test": "jest"
+    "test": "jest",


Please bump the version in this file. Since this is quite a large change, bump the minor version to 4.8.0 and then npm install to update the lock file.

JakeSCahill · 2025-07-23T09:11:43Z

tools/metrics-extractor/Makefile

+
+SHELL := /bin/bash
+TAG ?= dev
+OUTPUT_DIR := autogenerated/$(TAG)/source-metrics


Since the CLI is going to be installed inside our content repos, we'll want the output saved to the root of the repo rather than relative to this package in node_modules so it's easy to find.

This is what we did for the property-docs automation:

REPO_ROOT := $(shell git rev-parse --show-toplevel) MODULE_ROOT := $(shell cd "$(dir $(realpath $(lastword $(MAKEFILE_LIST))))"/../.. && pwd) TOOL_ROOT := $(MODULE_ROOT)/tools/property-extractor TMP_ROOT := $(TOOL_ROOT)/tmp REDPANDA_SRC := $(TMP_ROOT)/redpanda TREESITTER_DIR:= $(TOOL_ROOT)/tree-sitter/tree-sitter-cpp VENV := $(TOOL_ROOT)/tmp/redpanda-property-extractor-venv PYTHON := $(VENV)/bin/python OUTPUT_DIR := $(REPO_ROOT)/autogenerated/$(TAG)/properties

JakeSCahill · 2025-07-23T09:13:53Z

Looking really nice @paulohtb6 thanks for working on this!

When it's ready, can you update our wiki: https://redpandadata.atlassian.net/wiki/spaces/DOC/pages/1185054748/Doc+Tools+CLI

JakeSCahill · 2025-07-23T09:14:54Z

Some nice suggestions and praise from 🐰 !

take a look at some of the suggestions: #118 (comment)

coderabbitai

Actionable comments posted: 12

♻️ Duplicate comments (5)

package.json (1)

24-27: Version bump still missing & lock-file not updated
Jake already requested raising the package version (→ 4.8.0) and regenerating the lock file. That hasn’t happened yet. Please bump the version and run npm install to commit the updated package-lock.json.

tools/metrics-extractor/README.adoc (1)

35-36: Address the filename convention feedback

The filenames still use underscores instead of hyphens as previously mentioned by JakeSCahill. Please update to use hyphens for consistency with project conventions.

bin/doc-tools.js (2)

466-467: Use hyphens instead of underscores in filenames

As previously mentioned by JakeSCahill, filenames should use hyphens instead of underscores for consistency.

483-488: Use make build instead of direct Python execution

As previously mentioned by JakeSCahill, this should execute make build instead of calling the Python script directly, similar to how property-docs works.
tools/metrics-extractor/Makefile (1)
6-6: Output should be saved to repo root, not relative to this package

As previously mentioned by JakeSCahill, the output should be saved to the repository root rather than relative to this package location. This makes it easier to find when installed in content repos.

Consider updating to use the same pattern as property-extractor:
REPO_ROOT     := $(shell git rev-parse --show-toplevel)
OUTPUT_DIR    := $(REPO_ROOT)/autogenerated/$(TAG)/source-metrics

🧹 Nitpick comments (13)

cli-utils/python-venv.sh (1)

10-10: Trim the trailing whitespace to avoid noisy diffs
The extra space after the closing quote ("..."␠) is harmless at runtime but unnecessarily clutters future git blame/diffs.
-echo "Recreating Python venv at $VENV_DIR..." 
+echo "Recreating Python venv at $VENV_DIR..."

tools/metrics-extractor/requirements.txt (1)

1-2: Pinning strategy & supply-chain hygiene

tree_sitter==0.21.1 – hard-pinning implies you’ll miss upstream security/bug fixes. Prefer ~=0.21 unless reproducibility absolutely mandates a full pin.

Consider adding --require-hashes support (hash-checked requirements) to mitigate dependency-confusion or typosquatting attacks.

cli-utils/install-test-dependencies.sh (1)

16-16: Minor whitespace nit
Same trailing-space issue as the venv script; clean-up keeps history tidy.
-    echo "Node.js version: $(node -v)" 
+    echo "Node.js version: $(node -v)"

package.json (1)

24-27: Script names OK but undocumented
The new metrics:extract and metrics:legacy targets look good; ensure README/CONTRIB docs mention them so users don’t rely on the raw npx incantation only.

tools/metrics-extractor/tests/test_extraction.py (1)

76-79: Consider making tree-sitter paths configurable.

The hardcoded paths for tree-sitter components ("tree-sitter", "tree-sitter-cpp.so") could be made configurable to improve test portability across different environments.

Consider using environment variables or configuration parameters:
-        parser, language = get_treesitter_cpp_parser_and_language("tree-sitter", "tree-sitter-cpp.so")
+        parser, language = get_treesitter_cpp_parser_and_language(
+            os.environ.get("TREESITTER_PATH", "tree-sitter"),
+            os.environ.get("TREESITTER_CPP_LIB", "tree-sitter-cpp.so")
+        )

tools/metrics-extractor/validate.py (1)

12-33: Use importlib.util.find_spec for cleaner dependency checking

Consider using importlib.util.find_spec instead of try/except imports to check module availability without actually importing them.

+import importlib.util
+
 def check_dependencies():
     """Check if all required dependencies are available"""
     print("🔧 Checking dependencies...")
     
-    try:
-        import tree_sitter
+    if importlib.util.find_spec('tree_sitter') is not None:
         print("  ✓ tree-sitter is available")
-    except ImportError:
+    else:
         print("  ❌ tree-sitter not found. Install with: pip install tree-sitter")
         return False
     
     # Check if we can import our modules
-    try:
-        from metrics_parser import build_treesitter_cpp_library, extract_metrics_from_files
-        from metrics_bag import MetricsBag
+    modules_to_check = ['metrics_parser', 'metrics_bag']
+    for module in modules_to_check:
+        if importlib.util.find_spec(module) is None:
+            print(f"  ❌ Module '{module}' not found")
+            return False
+    
+    try:
+        # Verify specific functions exist by importing them
+        from metrics_parser import build_treesitter_cpp_library, extract_metrics_from_files
+        from metrics_bag import MetricsBag
         print("  ✓ All custom modules are available")
     except ImportError as e:
         print(f"  ❌ Import error: {e}")
         return False
     
     return True

tools/metrics-extractor/metrics_extractor.py (3)

20-27: Enhance path validation for better user experience.

The current validation only checks if the path exists. Consider adding more comprehensive checks to ensure the path is a valid Redpanda repository.

 def validate_paths(options):
     path = options.redpanda_repo
 
     if not os.path.exists(path):
         logger.error(f'Path does not exist: "{path}".')
+        logger.error(f'Please ensure you have cloned the Redpanda repository or specify a valid path.')
         sys.exit(1)
+    
+    if not os.path.isdir(path):
+        logger.error(f'Path is not a directory: "{path}".')
+        sys.exit(1)
+    
+    # Check for expected Redpanda source structure
+    expected_markers = ['src', 'CMakeLists.txt']
+    if not any(os.path.exists(os.path.join(path, marker)) for marker in expected_markers):
+        logger.warning(f'Path may not be a valid Redpanda repository: "{path}".')
+        logger.warning('Expected to find "src" directory or "CMakeLists.txt" file.')

158-372: Consider refactoring this large function for better maintainability.

The function is over 200 lines and handles multiple responsibilities. The extensive hardcoded category mappings could be externalized.

Extract the category mapping logic into a separate configuration:

# Define at module level or in a separate config file
METRIC_CATEGORY_MAPPINGS = {
    'cluster': 'Cluster metrics',
    'kafka': 'Kafka metrics',
    'raft': 'Raft metrics',
    # ... rest of the mappings
}

# Keywords that map to application metrics
APPLICATION_METRIC_KEYWORDS = {
    'active', 'adjacent', 'anomalies', 'available', 'backlog',
    # ... rest of the keywords
}

def get_metric_category(metric_name):
    """Determine the category for a metric based on its name."""
    # Remove prefixes
    clean_name = metric_name
    for prefix in ['redpanda_', 'vectorized_']:
        if clean_name.startswith(prefix):
            clean_name = clean_name[len(prefix):]
            break
    
    # Get first part
    parts = clean_name.split('_')
    category = parts[0] if parts else 'other'
    
    # Map to category name
    if category in METRIC_CATEGORY_MAPPINGS:
        return METRIC_CATEGORY_MAPPINGS[category]
    elif category in APPLICATION_METRIC_KEYWORDS:
        return 'Application metrics'
    else:
        return 'Other metrics'

Also consider splitting the function into smaller functions:

separate_metrics_by_type()
group_metrics_by_category()
write_metrics_documentation()

455-455: Remove unnecessary f-string prefix.

The string doesn't contain any placeholders.

-    print(f"📊 Metrics by type:")
+    print("📊 Metrics by type:")

tools/metrics/compare_metrics.py (1)

308-317: Good error handling, but consider file size limits.

While the error handling is good, consider adding file size validation to prevent issues with very large files.

     try:
+        # Check file sizes first
+        for filepath in [args.file1, args.file2]:
+            file_size = os.path.getsize(filepath)
+            if file_size > 10 * 1024 * 1024:  # 10MB limit
+                print(f"Warning: {filepath} is large ({file_size / 1024 / 1024:.1f}MB). This may take a while.")
+        
         with open(args.file1, 'r', encoding='utf-8') as f:
             content1 = f.read()

tools/metrics-extractor/metrics_parser.py (3)

101-129: Consider compiling regex patterns for better performance.

The regex patterns are compiled on every function call. For better performance, especially when processing many files, compile them once at module level.

# At module level
LABEL_PATTERNS = [
    re.compile(r'\.aggregate\s*\(\s*([^)]+)\s*\)'),  # .aggregate(aggregate_labels)
    re.compile(r'auto\s+(\w*labels\w*)\s*='),        # auto aggregate_labels =
    re.compile(r'std::vector<[^>]*>\s*{([^}]+)}'),   # std::vector<sm::label>{sm::shard_label}
    re.compile(r'sm::([a-z_]*label[a-z_]*)'),        # sm::shard_label, sm::topic_label, etc.
    re.compile(r'"([^"]+)"\s*:\s*[^,}]+'),           # key-value pairs
]

def extract_labels_from_code(code_context):
    """Extract potential label names from code context around metrics"""
    labels = set()
    
    for pattern in LABEL_PATTERNS:
        matches = pattern.findall(code_context)
        # ... rest of the function

231-255: Simplify nested conditions where possible.

Some nested if statements can be combined for better readability.

-                        if inner_function and '::metrics_name' in inner_function.text.decode('utf-8'):
-                            # Found it. Extract the string literal from its arguments.
-                            if inner_args and inner_args.named_child_count > 0:
+                        if (inner_function and '::metrics_name' in inner_function.text.decode('utf-8') and
+                            inner_args and inner_args.named_child_count > 0):
                                 group_name_node = inner_args.named_children[0]
                                 if group_name_node.type == 'string_literal':

1115-1128: Consider parallel processing for better performance.

The current implementation processes files sequentially. For large codebases, parallel processing could significantly improve performance.

from concurrent.futures import ProcessPoolExecutor, as_completed
import multiprocessing

def extract_metrics_from_files(cpp_files, treesitter_parser, cpp_language, filter_namespace=None):
    """Extract metrics from multiple C++ files"""
    all_metrics = MetricsBag()
    
    # Use process pool for parallel parsing
    max_workers = min(multiprocessing.cpu_count(), len(cpp_files))
    
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        # Submit all files for processing
        future_to_file = {
            executor.submit(parse_cpp_file, file_path, treesitter_parser, cpp_language, filter_namespace): file_path
            for file_path in cpp_files
        }
        
        # Collect results as they complete
        for future in as_completed(future_to_file):
            file_path = future_to_file[future]
            try:
                file_metrics = future.result()
                all_metrics.merge(file_metrics)
            except Exception as e:
                logger.warning(f"Failed to process {file_path}: {e}")
    
    return all_metrics

Note: This would require making the parser objects pickleable or recreating them in each process.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 922f9fe and ab66342.

⛔ Files ignored due to path filters (1)

tools/metrics-extractor/tree-sitter-cpp.so is excluded by !**/*.so

📒 Files selected for processing (18)

README.adoc (1 hunks)
bin/doc-tools.js (6 hunks)
cli-utils/generate-cluster-docs.sh (3 hunks)
cli-utils/install-test-dependencies.sh (1 hunks)
cli-utils/python-venv.sh (1 hunks)
docker-compose/25.1/docker-compose.yml (4 hunks)
docker-compose/docker-compose.yml (1 hunks)
package.json (1 hunks)
tools/metrics-extractor/Makefile (1 hunks)
tools/metrics-extractor/README.adoc (1 hunks)
tools/metrics-extractor/metrics_bag.py (1 hunks)
tools/metrics-extractor/metrics_extractor.py (1 hunks)
tools/metrics-extractor/metrics_parser.py (1 hunks)
tools/metrics-extractor/requirements.txt (1 hunks)
tools/metrics-extractor/tests/test_extraction.py (1 hunks)
tools/metrics-extractor/validate.py (1 hunks)
tools/metrics/compare_metrics.py (1 hunks)
tools/metrics/metrics.py (4 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

tools/metrics-extractor/validate.py (4)

bin/doc-tools.js (4)

os (4-4)

result (305-305)

result (482-488)

result (619-627)

tools/metrics-extractor/metrics_parser.py (2)

build_treesitter_cpp_library (61-64)

extract_metrics_from_files (1115-1127)

tools/metrics-extractor/metrics_bag.py (1)

MetricsBag (9-211)

tools/metrics-extractor/metrics_extractor.py (1)

main (374-459)

🪛 checkmake (0.2.2)

tools/metrics-extractor/Makefile

[warning] 19-19: Target body for "help" exceeds allowed length of 5 (13).

(maxbodylength)

[warning] 69-69: Target body for "treesitter" exceeds allowed length of 5 (8).

(maxbodylength)

[warning] 97-97: Target body for "clean" exceeds allowed length of 5 (7).

(maxbodylength)

[warning] 15-15: Missing required phony target "test"

(minphony)

🪛 Ruff (0.12.2)

tools/metrics-extractor/validate.py

7-7: json imported but unused

Remove unused import: json

(F401)

17-17: tree_sitter imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

25-25: metrics_parser.build_treesitter_cpp_library imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

25-25: metrics_parser.extract_metrics_from_files imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

26-26: metrics_bag.MetricsBag imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

49-49: f-string without any placeholders

Remove extraneous f prefix

(F541)

tools/metrics-extractor/metrics_extractor.py

6-6: re imported but unused

Remove unused import: re

(F401)

12-12: metrics_bag.MetricsBag imported but unused

Remove unused import: metrics_bag.MetricsBag

(F401)

236-239: Combine if branches using logical or operator

Combine if branches

(SIM114)

238-241: Combine if branches using logical or operator

Combine if branches

(SIM114)

240-243: Combine if branches using logical or operator

Combine if branches

(SIM114)

455-455: f-string without any placeholders

Remove extraneous f prefix

(F541)

tools/metrics/compare_metrics.py

12-12: typing.Tuple imported but unused

Remove unused import

(F401)

12-12: typing.Optional imported but unused

Remove unused import

(F401)

231-231: f-string without any placeholders

Remove extraneous f prefix

(F541)

234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)

243-243: f-string without any placeholders

Remove extraneous f prefix

(F541)

251-251: f-string without any placeholders

Remove extraneous f prefix

(F541)

272-272: f-string without any placeholders

Remove extraneous f prefix

(F541)

280-280: f-string without any placeholders

Remove extraneous f prefix

(F541)

287-287: f-string without any placeholders

Remove extraneous f prefix

(F541)

tools/metrics-extractor/metrics_parser.py

1-1: os imported but unused

Remove unused import: os

(F401)

3-3: subprocess imported but unused

Remove unused import: subprocess

(F401)

231-233: Use a single if statement instead of nested if statements

(SIM102)

309-309: f-string without any placeholders

Remove extraneous f prefix

(F541)

426-430: Use a single if statement instead of nested if statements

(SIM102)

428-430: Use a single if statement instead of nested if statements

(SIM102)

489-490: Use a single if statement instead of nested if statements

(SIM102)

553-554: Use a single if statement instead of nested if statements

(SIM102)

619-621: Use a single if statement instead of nested if statements

(SIM102)

637-638: Use a single if statement instead of nested if statements

(SIM102)

681-682: Use a single if statement instead of nested if statements

(SIM102)

698-699: Use a single if statement instead of nested if statements

(SIM102)

tools/metrics-extractor/metrics_bag.py

3-3: uuid imported but unused

Remove unused import: uuid

(F401)

125-125: Loop control variable name not used within loop body

Rename unused name to _name

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Redirect rules - docs-extensions-and-macros
GitHub Check: Header rules - docs-extensions-and-macros
GitHub Check: Pages changed - docs-extensions-and-macros

🔇 Additional comments (33)

tools/metrics/metrics.py (4)

123-142: Well-implemented filtering function for documentation cleanup.

The filter_metrics_for_docs function effectively addresses the need to exclude histogram suffixes and handle duplicates for documentation while preserving complete data for JSON output. The logic is sound and the logging provides good visibility into filtering operations.

183-184: Improved error handling for directory creation.

Good change from logging an error and exiting to actually creating the directory when it doesn't exist. This makes the script more robust and user-friendly.

207-209: Proper separation of filtered vs unfiltered metrics.

The approach of filtering metrics specifically for documentation (AsciiDoc) while keeping the original unfiltered data for JSON output is well-designed. This ensures documentation clarity without losing data completeness.

223-224: Consistent application of filtering to both metric types.

Both public and internal metrics are properly filtered using the new function before AsciiDoc generation, ensuring consistent documentation output across both metric types.

README.adoc (1)

969-1008: Excellent documentation for the new CLI tools.

The new CLI Tools section provides clear and comprehensive documentation for the metrics automation features introduced in this PR. The documentation:

Clearly explains the three different workflows (source-based, local repo, legacy Docker)

Documents the output files users can expect

Provides practical usage examples with appropriate command syntax

Includes helpful context about when to use each approach

This aligns perfectly with the PR objectives for introducing reliable automated metrics documentation generation.

tools/metrics-extractor/tests/test_extraction.py (2)

1-61: Well-structured test with comprehensive sample C++ code.

The test provides good coverage of the metrics extraction functionality with a realistic sample that includes various metric constructor types. The C++ code structure accurately represents what would be found in the Redpanda codebase.

88-104: Robust test validation with good error reporting.

The test validation logic effectively checks both the presence and type correctness of extracted metrics. The visual feedback with checkmarks and error messages makes test results easy to interpret.

docker-compose/25.1/docker-compose.yml (3)

54-54: Improved healthcheck simplification.

Changing the healthcheck from rpk cluster info to rpk version is a good improvement. The rpk version command is faster, more reliable, and doesn't require cluster connectivity, making the healthcheck more robust during startup.

59-62: Enhanced service dependency management.

The explicit depends_on conditions with service_started, service_healthy, and service_completed_successfully ensure proper startup sequencing. This is particularly important for the metrics extraction workflow where stable cluster initialization is critical.

Also applies to: 93-98, 129-134

402-407: Modernized MinIO client configuration.

The update from mc config host add to mc alias set aligns with current MinIO client best practices. The explicit exit with status 0 provides cleaner completion signaling compared to the previous approach of tailing /dev/null.

cli-utils/generate-cluster-docs.sh (5)

5-8: Excellent logging enhancement for better user experience.

The log_step function with timestamps significantly improves the user experience by providing clear progress indicators and timing information. This is particularly valuable for long-running operations like cluster setup.

25-28: Good pre-flight check addition.

Adding the curl dependency check is a smart improvement since the script uses curl for metrics endpoint polling. This provides early failure detection with clear error messaging.

78-92: Much improved readiness checking with polling.

Replacing the fixed 300-second sleep with an intelligent polling loop is a significant improvement. The script now:

Actively checks endpoint readiness instead of blindly waiting

Provides faster execution when the endpoint becomes ready early

Includes proper timeout handling with clear error messaging

Gives users better feedback about the waiting process

103-103: Correct path update for new metrics extractor.

The path change from metrics/requirements.txt to metrics-extractor/requirements.txt properly aligns with the new metrics extraction tooling structure introduced in this PR.

113-116: Updated command reflects new metrics extraction workflow.

The command update from the old metrics.py script to the new metrics_extractor.py with explicit JSON and separate internal/external AsciiDoc outputs correctly implements the new source-based metrics documentation generation approach described in the PR objectives.

tools/metrics-extractor/README.adoc (1)

1-73: Well-documented tool with clear usage instructions

The README provides comprehensive documentation for the metrics extractor tool, including quick start guides, technical details, prerequisites, and development commands. The structure and content are excellent.

tools/metrics-extractor/validate.py (1)

58-169: Well-structured validation script

The validation functions and main orchestration logic are well-implemented. The script provides clear feedback at each step and handles errors appropriately.

bin/doc-tools.js (3)

267-286: Dependencies check looks good

The new verifyMetricsExtractorDependencies function properly checks for all required dependencies including make, Python 3.10+, Git, and a C++ compiler.

418-431: Good improvements to cluster docs execution

The enhanced logging, timing, and error reporting provide much better user feedback during the cluster documentation generation process.

505-551: Legacy command properly renamed with enhanced logging

Good job renaming the legacy command and adding enhanced logging for better user feedback. This appropriately separates the legacy Docker-based approach from the new source-based extraction.

tools/metrics-extractor/metrics_bag.py (1)

9-211: Well-designed metrics container class

The MetricsBag class is well-implemented with comprehensive functionality for storing, merging, and managing metrics. The deterministic ID generation, proper merge logic, and support for multiple output formats make this a solid foundation for the metrics extraction tooling.

tools/metrics-extractor/Makefile (1)

1-155: Well-structured Makefile with comprehensive automation

The Makefile provides excellent automation for the metrics extraction workflow with proper dependency management, error handling, and development convenience targets. The pinned tree-sitter version (v0.20.5) ensures consistent parsing behavior.

tools/metrics-extractor/metrics_extractor.py (5)

28-47: Well-structured file discovery implementation.

The function properly handles both individual files and directories, with appropriate C++ file extensions and recursive search support.

49-59: Correct Tree-sitter initialization.

The function properly initializes the C++ parser and handles library building when needed.

61-107: Well-designed command-line interface.

The argument parser provides a comprehensive set of options with clear help messages and sensible defaults. Good backward compatibility handling for the deprecated --asciidoc option.

109-156: Excellent data normalization functions.

Both clean_description and clean_labels implement thoughtful normalization:

Ensuring descriptions end with proper punctuation

Smart handling of braced vs. unbraced labels

Proper deduplication and sorting

374-463: Well-orchestrated main function with good user feedback.

The main function properly handles:

Logging configuration based on verbosity

Path validation and parser initialization

Clear error messages for missing dependencies

Informative output with emoji indicators

Comprehensive metrics breakdown

The only minor issue is the unnecessary f-string on line 455.

tools/metrics/compare_metrics.py (1)

30-152: Well-implemented AsciiDoc parser with robust metric extraction.

The parser correctly handles:

Multiple heading levels (== and ===)

Various metric properties (type, labels, usage, etc.)

Error recovery with warning messages

One minor suggestion: Consider using instance variables instead of modifying locals() in _process_section for better code clarity.

tools/metrics-extractor/metrics_parser.py (5)

196-317: Robust variable resolution with appropriate fallbacks.

The function implements a comprehensive strategy for resolving group names:

Local scope search

Broader scope search

Function scope search

Forward declaration search

File-wide search

Emergency fallbacks

This multi-layered approach ensures metrics are rarely lost due to unresolved variables.

761-792: Well-implemented metric name construction.

The function properly:

Sanitizes group names by replacing special characters

Applies correct prefixes based on metric type (vectorized_ for internal, redpanda_ for external)

Handles edge cases with unknown group names

794-905: Good handling of Seastar-specific metric patterns.

Both functions properly extract Seastar metrics with appropriate:

Pattern matching for different Seastar APIs

Recursive AST traversal

Consistent metric metadata extraction

907-1016: Excellent orchestration with critical safeguards.

The function demonstrates:

Proper sequencing of different parsing strategies (Seastar then Prometheus)

Critical null-check safeguard for group names with multiple fallback strategies

Comprehensive error handling

Good logging for debugging

The emergency fallback logic (lines 981-998) is particularly well-designed to ensure metrics are never lost due to unresolved group names.

1018-1113: Sophisticated description extraction with C++ concatenation support.

The function excellently handles:

Multi-part descriptions from C++ automatic string concatenation

Proper filtering of comments and whitespace between strings

Position-based string literal tracking

Filtering of incomplete format strings

coderabbitai · 2025-07-23T20:23:58Z

bin/doc-tools.js

+automation
+  .command('source-metrics-docs')
+  .description('Generate metrics documentation from Redpanda source code using tree-sitter')
+  .option('--tag <tag>', 'Git tag or branch to extract from', 'dev')
+  .option('--diff <oldTag>', 'Also diff autogenerated metrics from <oldTag> → <tag>')
+  .action((options) => {
+    verifyMetricsExtractorDependencies();
+
+    const newTag = options.tag;
+    const oldTag = options.diff;
+    const cwd = path.resolve(__dirname, '../tools/metrics-extractor');
+    const make = (tag) => {
+      console.log(`⏳ Building source-based metrics docs for ${tag}…`);
+      const r = spawnSync('make', ['build', `TAG=${tag}`], { cwd, stdio: 'inherit' });
+      if (r.error) {
+        console.error(`❌ ${r.error.message}`);
+        process.exit(1);
+      }
+      if (r.status !== 0) process.exit(r.status);
+    };
+
+    if (oldTag) {
+      const oldDir = path.join('autogenerated', oldTag, 'source-metrics');
+      if (!fs.existsSync(oldDir)) make(oldTag);
+    }
+
+    make(newTag);
+
+    if (oldTag) {
+      diffDirs('source-metrics', oldTag, newTag);
+    }
+
+    process.exit(0);
+  });


🛠️ Refactor suggestion

Well-implemented command following established patterns

This source-metrics-docs command properly addresses all previous review feedback by using make build and supporting the same options as property-docs.

However, having both metrics-docs and source-metrics-docs commands may confuse users. Consider removing the metrics-docs command (lines 461-503) and renaming this to just metrics-docs as suggested in the previous review.

🤖 Prompt for AI Agents

In bin/doc-tools.js between lines 461 and 503, remove the existing 'metrics-docs' command to avoid user confusion. Then rename the 'source-metrics-docs' command defined around lines 795 to 828 to 'metrics-docs' to unify the command naming. Ensure all references and descriptions are updated accordingly to reflect this change.

coderabbitai · 2025-07-23T20:23:58Z

docker-compose/docker-compose.yml

+      test: ["CMD", "rpk", "version"]
      interval: 10s
      timeout: 15s
      retries: 10


⚠️ Potential issue

Health-check no longer validates broker health
rpk version succeeds even when Redpanda isn’t running; the command only prints the CLI binary’s version and never contacts the broker. The previous rpk cluster info … actually hit the admin API.

This makes depends_on: condition: service_healthy meaningless and may cause race conditions for services relying on a ready broker.

- test: ["CMD", "rpk", "version"] + # Probe the broker over the Admin API; keeps zero external creds. + test: ["CMD", "rpk", "cluster", "info", "-X", "brokers=localhost:9644"]

🤖 Prompt for AI Agents

In docker-compose/docker-compose.yml around lines 54 to 57, the health-check uses `rpk version`, which only checks the CLI version and does not verify broker health. Replace the health-check command with one that queries the broker's admin API, such as `rpk cluster info`, to ensure the health-check accurately reflects the broker's readiness and prevents race conditions for dependent services.

coderabbitai · 2025-07-23T20:23:58Z

tools/metrics-extractor/Makefile

+	$(PYTHON) metrics_extractor.py \
+		--recursive \
+		--output $(OUTPUT_DIR)/metrics.json \
+		--internal-asciidoc $(OUTPUT_DIR)/internal_metrics_reference.adoc \
+		--external-asciidoc $(OUTPUT_DIR)/public_metrics_reference.adoc \
+		--filter-namespace redpanda \
+		--verbose \
+		$(REDPANDA_PATH)/src


⚠️ Potential issue

Fix incorrect argument name in extract-local target

The extract-local target uses --output but should use --json-output to match the expected argument name in metrics_extractor.py.

$(PYTHON) metrics_extractor.py \ --redpanda-repo $(REDPANDA_PATH) \ - --output $(OUTPUT_DIR)/metrics.json \ + --json-output $(OUTPUT_DIR)/metrics.json \ --internal-asciidoc $(OUTPUT_DIR)/internal_metrics_reference.adoc \ --external-asciidoc $(OUTPUT_DIR)/public_metrics_reference.adoc \ - --filter-namespace redpanda \ --verbose - $(REDPANDA_PATH)/src

Also note that the command seems to have extra arguments that may not be supported:

--filter-namespace redpanda

$(REDPANDA_PATH)/src as a positional argument

Please verify these arguments are supported by the script.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tools/metrics-extractor/Makefile around lines 147 to 154, the extract-local target uses the argument --output which should be changed to --json-output to match the expected argument in metrics_extractor.py. Additionally, verify if the arguments --filter-namespace redpanda and the positional argument $(REDPANDA_PATH)/src are supported by the script; if not, remove or correct them accordingly to ensure compatibility with metrics_extractor.py.

coderabbitai · 2025-07-23T20:23:59Z

tools/metrics-extractor/metrics_bag.py

@@ -0,0 +1,211 @@
+import logging
+import hashlib
+import uuid


⚠️ Potential issue

Remove unused import

The uuid module is imported but never used in this file.

import logging import hashlib -import uuid from collections import defaultdict

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import uuid

import logging

import hashlib

from collections import defaultdict

🧰 Tools

🪛 Ruff (0.12.2)

3-3: uuid imported but unused

Remove unused import: uuid

(F401)

🤖 Prompt for AI Agents

In tools/metrics-extractor/metrics_bag.py at line 3, the uuid module is imported but not used anywhere in the file. Remove the import statement for uuid to clean up unused imports and improve code clarity.

coderabbitai · 2025-07-23T20:23:59Z

tools/metrics-extractor/metrics_parser.py

@@ -0,0 +1,1127 @@
+import os
+import re
+import subprocess


⚠️ Potential issue

Remove unused import.

subprocess is imported but not used in the code.

-import subprocess

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import subprocess

🧰 Tools

🪛 Ruff (0.12.2)

3-3: subprocess imported but unused

Remove unused import: subprocess

(F401)

🤖 Prompt for AI Agents

In tools/metrics-extractor/metrics_parser.py at line 3, the subprocess module is imported but never used. Remove the import statement for subprocess to clean up the code and avoid unnecessary imports.

coderabbitai · 2025-07-23T20:23:59Z

tools/metrics-extractor/validate.py

+"""
+import os
+import sys
+import json


⚠️ Potential issue

Remove unused import

The json module is imported but never used in this file.

-import json

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import json

🧰 Tools

🪛 Ruff (0.12.2)

7-7: json imported but unused

Remove unused import: json

(F401)

🤖 Prompt for AI Agents

In tools/metrics-extractor/validate.py at line 7, the json module is imported but not used anywhere in the file. Remove the import statement for json to clean up the code and avoid unnecessary imports.

coderabbitai · 2025-07-23T20:23:59Z

tools/metrics-extractor/validate.py

+            print(f"  ❌ Example script failed:")
+            print(f"     stdout: {result.stdout}")
+            print(f"     stderr: {result.stderr}")


⚠️ Potential issue

Remove unnecessary f-string prefix

The f-string prefix is not needed as there are no placeholders in the string.

- print(f" ❌ Example script failed:") + print(" ❌ Example script failed:") print(f" stdout: {result.stdout}") print(f" stderr: {result.stderr}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f" ❌ Example script failed:")

print(f" stdout: {result.stdout}")

print(f" stderr: {result.stderr}")

print(" ❌ Example script failed:")

print(f" stdout: {result.stdout}")

print(f" stderr: {result.stderr}")

🧰 Tools

🪛 Ruff (0.12.2)

49-49: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In tools/metrics-extractor/validate.py around lines 49 to 51, the print statements use f-string prefixes without any placeholders. Remove the unnecessary f-string prefix from these print statements to simplify the code.

coderabbitai · 2025-07-23T20:23:59Z

tools/metrics/compare_metrics.py

+
+import re
+import argparse
+from typing import Dict, List, Tuple, Optional


⚠️ Potential issue

Remove unused imports.

Tuple and Optional are imported but not used in the code.

-from typing import Dict, List, Tuple, Optional +from typing import Dict, List

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from typing import Dict, List, Tuple, Optional

-from typing import Dict, List, Tuple, Optional

+from typing import Dict, List

🧰 Tools

🪛 Ruff (0.12.2)

12-12: typing.Tuple imported but unused

Remove unused import

(F401)

12-12: typing.Optional imported but unused

Remove unused import

(F401)

🤖 Prompt for AI Agents

In tools/metrics/compare_metrics.py at line 12, the imports Tuple and Optional are not used anywhere in the code. Remove Tuple and Optional from the import statement to clean up unused imports, leaving only Dict and List.

coderabbitai · 2025-07-23T20:23:59Z

tools/metrics/compare_metrics.py

+            differences.append(f"Labels differ")
+
+        if metric1.usage != metric2.usage:
+            differences.append(f"Usage differs")


⚠️ Potential issue

Remove unnecessary f-string prefixes.

These strings don't contain any placeholders.

- differences.append(f"Labels differ") + differences.append("Labels differ") if metric1.usage != metric2.usage: - differences.append(f"Usage differs") + differences.append("Usage differs")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

differences.append(f"Labels differ")

if metric1.usage != metric2.usage:

differences.append(f"Usage differs")

differences.append("Labels differ")

if metric1.usage != metric2.usage:

differences.append("Usage differs")

🧰 Tools

🪛 Ruff (0.12.2)

231-231: f-string without any placeholders

Remove extraneous f prefix

(F541)

234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In tools/metrics/compare_metrics.py around lines 231 to 234, remove the unnecessary f-string prefixes from the strings "Labels differ" and "Usage differs" since they do not contain any placeholders. Replace the f-strings with regular string literals to simplify the code.

coderabbitai · 2025-07-23T20:24:00Z

tools/metrics/compare_metrics.py

+    print(f"METRICS COMPARISON REPORT")
+    print(f"{'='*60}")
+    print(f"File 1 ({file1_name}): {results['total_file1']} metrics")
+    print(f"File 2 ({file2_name}): {results['total_file2']} metrics")
+    print(f"Common metrics: {results['common_count']}")
+
+    # Heading level differences
+    if results['heading_level_differences']:
+        print(f"\n📏 HEADING LEVEL DIFFERENCES:")
+        print(f"   Count: {len(results['heading_level_differences'])}")
+        for item in results['heading_level_differences']:
+            print(f"   - {item['name']}: {item['file1_level']} vs {item['file2_level']}")
+
+    # Metrics only in file 1 (should be removed)
+    if results['file1_unique']:
+        print(f"\n🗑️  METRICS TO REMOVE (only in {file1_name}):")
+        print(f"   Count: {len(results['file1_unique'])}")
+        for metric in results['file1_unique']:
+            print(f"   - {metric}")
+
+    # Metrics only in file 2 (missing from file 1)
+    if results['file2_unique']:
+        print(f"\n📝 METRICS MISSING FROM {file1_name}:")
+        print(f"   Count: {len(results['file2_unique'])}")
+        for metric in results['file2_unique']:
+            print(f"   - {metric}")
+
+    # Description improvements
+    if results['improved_descriptions']:
+        print(f"\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:")
+        print(f"   Count: {len(results['improved_descriptions'])}")
+
+        for item in results['improved_descriptions']:
+            print(f"\n   📊 {item['name']}:")
+            print(f"      Similarity: {item['similarity']:.2f}")
+
+            if item['likely_improvement']:
+                print(f"      🔍 LIKELY IMPROVEMENT (File 1 has longer description)")
+
+            print(f"      File 1: {item['file1_desc'][:100]}{'...' if len(item['file1_desc']) > 100 else ''}")
+            print(f"      File 2: {item['file2_desc'][:100]}{'...' if len(item['file2_desc']) > 100 else ''}")
+
+    # Other property differences
+    if results['different_properties']:
+        print(f"\n🔧 OTHER PROPERTY DIFFERENCES:")


⚠️ Potential issue

Remove unnecessary f-string prefixes throughout the function.

Multiple strings don't contain placeholders and don't need the f-prefix.

- print(f"METRICS COMPARISON REPORT") + print("METRICS COMPARISON REPORT") - print(f"\n📏 HEADING LEVEL DIFFERENCES:") + print("\n📏 HEADING LEVEL DIFFERENCES:") - print(f"\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:") + print("\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:") - print(f" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)") + print(" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)") - print(f"\n🔧 OTHER PROPERTY DIFFERENCES:") + print("\n🔧 OTHER PROPERTY DIFFERENCES:")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"METRICS COMPARISON REPORT")

print(f"{'='*60}")

print(f"File 1 ({file1_name}): {results['total_file1']} metrics")

print(f"File 2 ({file2_name}): {results['total_file2']} metrics")

print(f"Common metrics: {results['common_count']}")

# Heading level differences

if results['heading_level_differences']:

print(f"\n📏 HEADING LEVEL DIFFERENCES:")

print(f" Count: {len(results['heading_level_differences'])}")

for item in results['heading_level_differences']:

print(f" - {item['name']}: {item['file1_level']} vs {item['file2_level']}")

# Metrics only in file 1 (should be removed)

if results['file1_unique']:

print(f"\n🗑️ METRICS TO REMOVE (only in {file1_name}):")

print(f" Count: {len(results['file1_unique'])}")

for metric in results['file1_unique']:

print(f" - {metric}")

# Metrics only in file 2 (missing from file 1)

if results['file2_unique']:

print(f"\n📝 METRICS MISSING FROM {file1_name}:")

print(f" Count: {len(results['file2_unique'])}")

for metric in results['file2_unique']:

print(f" - {metric}")

# Description improvements

if results['improved_descriptions']:

print(f"\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:")

print(f" Count: {len(results['improved_descriptions'])}")

for item in results['improved_descriptions']:

print(f"\n 📊 {item['name']}:")

print(f" Similarity: {item['similarity']:.2f}")

if item['likely_improvement']:

print(f" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)")

print(f" File 1: {item['file1_desc'][:100]}{'...' if len(item['file1_desc']) > 100 else ''}")

print(f" File 2: {item['file2_desc'][:100]}{'...' if len(item['file2_desc']) > 100 else ''}")

# Other property differences

if results['different_properties']:

print(f"\n🔧 OTHER PROPERTY DIFFERENCES:")

print("METRICS COMPARISON REPORT")

print(f"{'='*60}")

print(f"File 1 ({file1_name}): {results['total_file1']} metrics")

print(f"File 2 ({file2_name}): {results['total_file2']} metrics")

print(f"Common metrics: {results['common_count']}")

# Heading level differences

if results['heading_level_differences']:

print("\n📏 HEADING LEVEL DIFFERENCES:")

print(f" Count: {len(results['heading_level_differences'])}")

for item in results['heading_level_differences']:

print(f" - {item['name']}: {item['file1_level']} vs {item['file2_level']}")

# Metrics only in file 1 (should be removed)

if results['file1_unique']:

print(f"\n🗑️ METRICS TO REMOVE (only in {file1_name}):")

print(f" Count: {len(results['file1_unique'])}")

for metric in results['file1_unique']:

print(f" - {metric}")

# Metrics only in file 2 (missing from file 1)

if results['file2_unique']:

print(f"\n📝 METRICS MISSING FROM {file1_name}:")

print(f" Count: {len(results['file2_unique'])}")

for metric in results['file2_unique']:

print(f" - {metric}")

# Description improvements

if results['improved_descriptions']:

print("\n✨ POTENTIAL DESCRIPTION IMPROVEMENTS:")

print(f" Count: {len(results['improved_descriptions'])}")

for item in results['improved_descriptions']:

print(f"\n 📊 {item['name']}:")

print(f" Similarity: {item['similarity']:.2f}")

if item['likely_improvement']:

print(" 🔍 LIKELY IMPROVEMENT (File 1 has longer description)")

print(f" File 1: {item['file1_desc'][:100]}{'...' if len(item['file1_desc']) > 100 else ''}")

print(f" File 2: {item['file2_desc'][:100]}{'...' if len(item['file2_desc']) > 100 else ''}")

# Other property differences

if results['different_properties']:

print("\n🔧 OTHER PROPERTY DIFFERENCES:")

🧰 Tools

🪛 Ruff (0.12.2)

243-243: f-string without any placeholders

Remove extraneous f prefix

(F541)

251-251: f-string without any placeholders

Remove extraneous f prefix

(F541)

272-272: f-string without any placeholders

Remove extraneous f prefix

(F541)

280-280: f-string without any placeholders

Remove extraneous f prefix

(F541)

287-287: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In tools/metrics/compare_metrics.py from lines 243 to 287, many print statements use f-string prefixes unnecessarily where no variable interpolation occurs. Remove the f-prefix from all string literals that do not contain placeholders to simplify the code and improve readability.

paulohtb6 · 2025-07-23T20:29:46Z

During testing I identified several false-positives from the generation. Most of the problems were because the parsing was failing to get group_names. Some are EXTREMELY hard to identify via this method, even using the AST.

One in particular I'm considering impossible, because it spawns multiple files.

I'm trying to reduce the number of false-positives, and the latest commit introduces an aggressive check to resolve the group_names.

There's still several print debugs in the code because of this. So the automation must not be merged yet.

paulohtb6 · 2025-07-24T21:40:43Z

Just noticed that some labels are also missclassified.

In these two metrics, the autogenerated labels look like this.

=== redpanda_trust_file_crc32c

CRC32C checksum calculated from the contents of the trust file. This value is calculated when a valid certificate is loaded and a trust store is present. Otherwise, the value is zero.

*Type*: gauge

*Labels*:

- `aggregates`
- `label`
- `shard`

---

=== redpanda_truststore_expires_at_timestamp_seconds

Expiry time of the shortest-lived CA in the truststore(seconds since epoch).

*Type*: gauge

*Labels*:

- `aggregates`
- `label`
- `label_instance`

---

On the source code, available here https://github.com/redpanda-data/redpanda/blob/dev/src/v/net/probes.cc we can tell that it's actually area, detail, shard

Paulo Borges and others added 9 commits July 16, 2025 18:47

automation: improve metrics gen

941fb19

convert from CRLF to LF

5fbe9dd

change health check to a simple command

f2cb3be

check if linux or wsl (path changes)

fb21e06

modify metrics extraction

1f8ca7a

introduce new metrics automation

17597be

continue improving extraction

2ef97fc

working version

01b8694

remove debugs

412034a

paulohtb6 requested a review from JakeSCahill July 21, 2025 16:15

paulohtb6 added 9 commits July 22, 2025 15:43

change from UUID to full_name

d26ab4e

split internal and external metrics

1a5d7e0

split file gen between internal and external

55e622b

grouping and doc intro

52d8d86

fix full description

464660c

fix description extraction

6fcd947

final fixes

c2fad97

add docs and fix github tagging

a12dd12

additional fixes

f649efe

paulohtb6 marked this pull request as ready for review July 22, 2025 22:45

paulohtb6 changed the title ~~WIP: New metrics automation~~ New metrics automation Jul 22, 2025

JakeSCahill reviewed Jul 23, 2025

View reviewed changes

try to solve group_names and seastar related metrics

ab66342

coderabbitai bot reviewed Jul 23, 2025

View reviewed changes

paulohtb6 added 2 commits July 23, 2025 17:48

fix seastar generation

090176a

add a compare with original script

9c382b1


		All commands generate three files in the `autogenerated/` directory:

		* `internal_metrics_reference.adoc` - Internal metrics for engineering documentation

	from typing import Dict, List, Tuple, Optional
	-from typing import Dict, List, Tuple, Optional
	+from typing import Dict, List

New metrics automation #118

Are you sure you want to change the base?

New metrics automation #118

Uh oh!

Conversation

paulohtb6 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New metrics automation

Changes

Json output schema

more info

Uh oh!

netlify bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for docs-extensions-and-macros ready!

Uh oh!

coderabbitai bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

JakeSCahill commented Jul 23, 2025

Uh oh!

coderabbitai bot commented Jul 23, 2025

🔍 Overall Assessment

✅ Strengths

1. Architecture & Design

2. Code Quality

3. User Experience

⚠️ Areas for Improvement

1. Security Considerations

2. Resource Management

3. Error Handling Enhancement

4. Performance Optimization

5. Configuration Management

📋 Specific Code Issues

1. Tree-sitter Version Pinning

2. Memory Usage

3. Regex Complexity

🧪 Testing Recommendations

📚 Documentation Enhancements

🎯 Final Verdict

🚀 Next Steps

🔍 Overall Assessment

✅ Major Strengths

1. Architecture Excellence

2. Robust Implementation

3. Comprehensive Tooling

⚠️ Areas for Improvement

1. Enhanced Error Handling & File Operations

2. Performance Optimization for Large Codebases

3. Memory Management for Large Datasets

4. Enhanced Input Validation

5. Expanded Testing Coverage

📊 Code Quality Metrics

🧪 Testing Recommendations

📚 Documentation Enhancements

🎯 Final Recommendation

🚀 Priority Action Items

🌟 Standout Features

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

paulohtb6 commented Jul 21, 2025 •

edited

Loading

netlify bot commented Jul 21, 2025 •

edited

Loading

coderabbitai bot commented Jul 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

JakeSCahill Jul 23, 2025 •

edited

Loading