Skip to content

Ship: "compare" command (signal-level repository comparison) #179

Description

@calchiwo

Add a "compare" command that compares two supported targets using the existing ExplainThisRepo pipeline.

No new pipeline. No duplicated logic.

Reuse:

  • target normalization
  • local vs GitHub resolution
  • file and directory support
  • signal extraction
  • output system

Add only:

  • structured signal comparator
  • comparison report renderer

Why this matters

Right now the tool explains one codebase well.

The next real step is helping users answer:

  • how are these two systems different?
  • which one is simpler?
  • which one is backend vs frontend?
  • what actually changed between them?

This cannot be solved by diffing generated explanations. That approach is noisy and unreliable.

The comparison must be built on extracted signals.

CLI

Primary:

explainthisrepo compare <target-a> <target-b>

Examples:

explainthisrepo compare facebook/react vercel/next.js
explainthisrepo compare . ../another-project
explainthisrepo compare owner/repo/path/to/file.py ./file.py

Aliases already supported:

etr compare
explain-this-repo compare

Supported inputs

Both sides accept anything currently supported:

  • GitHub repo
  • local repo
  • GitHub directory
  • local directory
  • GitHub file
  • local file

Examples:

explainthisrepo compare owner/repo owner/repo
explainthisrepo compare ./dir ./other-dir
explainthisrepo compare owner/repo/path/file.py ./file.py
explainthisrepo compare owner/repo ./file.py

Supported pair types

  • repo vs repo
  • local repo vs local repo
  • repo vs local repo
  • repo vs file
  • file vs file
  • directory vs directory
  • local dir vs GitHub dir
  • GitHub file vs local file
  • private repo vs public repo
  • monorepo vs single-package repo

Output must adapt to what is actually comparable. No forced symmetry.

Core flow

input A + input B
→ normalize
→ resolve
→ extract signals (existing pipeline)
→ compare signals (new layer)
→ render report

Hard rule

Do not compare generated explanations.

Comparison must be:

  1. extract structured signals
  2. compare structured signals
  3. generate explanation from diff

Anything else introduces noise and inconsistency.

Canonical analysis model

Target layer

  • repo
  • local_repo
  • github_repo
  • directory
  • file

Signal layer

Repos / directories:

  • structure
  • entrypoints
  • manifests
  • configs
  • dependencies
  • tech stack
  • high-signal files
  • architecture hints

Files:

  • file type
  • size
  • imports
  • exports
  • symbols
  • purpose
  • logic shape

Interpretation layer

  • framework
  • runtime
  • package manager
  • app shape
  • frontend vs backend
  • monorepo vs single package
  • CLI vs library vs app
  • entry path

Diff layer

  • same
  • added
  • removed
  • changed
  • stronger signal
  • conflicting signal
  • not comparable

Internal design

TargetResolver (existing)

Used twice. No change.

SignalExtractor (existing)

Must return structured data. Not just prose.

Comparator (new)

Takes two signal objects → returns structured diff.

Example:

{
  "stack": {
    "only_in_a": ["fastapi"],
    "only_in_b": ["express"],
    "shared": ["docker"]
  },
  "entrypoints": {
    "a": ["app/main.py"],
    "b": ["src/index.ts"]
  }
}

ReportRenderer (existing + extension)

Renders comparison output.

Output

Default file:

COMPARE.md

Respect:

--output

Same behavior as existing commands.

Modes

Reuse:

  • "--quick" → one-line verdict
  • "--simple" → short comparison
  • "--detailed" → structured diff
  • default → full report

Do not reuse:

  • --stack
  • --map

Report structure

# Compare Report

## Summary

## What Target A looks like

## What Target B looks like

## Similarities

## Differences
- Stack
- Entry points
- Structure
- Architecture
- File types
- Dependency shape
- App type

## What matters most

## Confidence and limits

Behavior rules

  • Do not hallucinate missing signals
  • Mark partial comparisons (e.g. file vs repo)
  • Only compare overlapping domains
  • Keep system deterministic
  • LLM is optional and only for phrasing, not discovery

Optional scoring

{
  "stack_similarity": 0.72,
  "structure_similarity": 0.41,
  "architecture_distance": 0.83,
  "confidence": "high"
}

Implementation plan

  1. Add "compare" command routing
  2. Reuse resolver for both inputs
  3. Ensure structured signal output
  4. Build comparator (pure logic)
  5. Extend renderer
  6. Add tests
  7. Update docs

Test matrix

  • repo vs repo
  • local vs local
  • repo vs local
  • repo vs file
  • file vs file
  • dir vs dir
  • local dir vs GitHub dir
  • GitHub file vs local file
  • private vs public repo
  • monorepo vs single-package repo

Also:

  • invalid input
  • same target comparison
  • bad paths
  • output writing
  • stdout mode
  • no-LLM mode

Failure to avoid

Do not implement:

EXPLAIN(A) + EXPLAIN(B) → text diff

This is unstable and untrustworthy.

Correct approach:

signals → diff → explanation

Future

Design comparator to support multiple inputs:

explainthisrepo compare A B C

Not required now. Must not require rewrite later.

Constraint

This must remain:

  • signal-first
  • deterministic
  • single pipeline
  • minimal new surface area

If new logic starts duplicating extraction or branching per input type, the design has failed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestnodePull requests that update the node codepythonPull requests that update python code

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions