Skip to content

CRITICAL: Performance benchmarks are measuring mocks, not real parsing #73

@EffortlessSteven

Description

@EffortlessSteven

🚨 Critical Issue: Misleading Performance Claims

Summary

The current performance benchmarks in benchmarks/benches/glr_performance.rs are not measuring actual parsing but rather simple character counting loops. This creates false performance claims that could mislead users and developers.

Evidence

GLR Performance Benchmark (lines 48-57):

// TODO: Replace with actual Python parser once integrated
// For now, simulate parsing workload
let mut tokens = 0;
for char in source.chars() {
    if char.is_alphanumeric() || char.is_whitespace() {
        tokens += 1;
    }
}
black_box(tokens)

Fork Operations Benchmark (lines 70-96):

// Simulate fork
let forked = stacks[0].clone();
stacks.push(forked);

Current False Claims

  • "815 MB/sec throughput" - Based on character iteration, not parsing
  • "118M tokens/sec" - No actual tokenization happening
  • "100x faster than Tree-sitter" - Comparing mocks to real parsers

Impact

  • Misleading documentation: README claims production-ready performance
  • False benchmarking: Performance guide shows non-existent capabilities
  • User confusion: Developers may adopt based on false performance metrics
  • Technical debt: Benchmark infrastructure exists but doesn't measure real work

Required Actions

  1. Immediate: Add prominent disclaimers to all performance claims
  2. Short-term: Implement actual parsing benchmarks when lexer is complete
  3. Documentation: Update README/docs to reflect actual current state
  4. Benchmarks: Either remove misleading benchmarks or clearly mark as mocks

Priority

CRITICAL - This affects project credibility and user decision-making

Related Files

  • benchmarks/benches/glr_performance.rs - Mock benchmarks
  • README.md - False performance claims
  • PERFORMANCE_GUIDE.md - Misleading metrics
  • PROJECT_STATUS.md - Needs accuracy corrections

Context

Discovered during comprehensive performance analysis. The question "are we actually parsing or just processing mocks?" revealed the truth about current benchmark validity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions