Add a "compare" command that compares two supported targets using the existing ExplainThisRepo pipeline.
No new pipeline. No duplicated logic.
Reuse:
- target normalization
- local vs GitHub resolution
- file and directory support
- signal extraction
- output system
Add only:
- structured signal comparator
- comparison report renderer
Why this matters
Right now the tool explains one codebase well.
The next real step is helping users answer:
- how are these two systems different?
- which one is simpler?
- which one is backend vs frontend?
- what actually changed between them?
This cannot be solved by diffing generated explanations. That approach is noisy and unreliable.
The comparison must be built on extracted signals.
CLI
Primary:
explainthisrepo compare <target-a> <target-b>
Examples:
explainthisrepo compare facebook/react vercel/next.js
explainthisrepo compare . ../another-project
explainthisrepo compare owner/repo/path/to/file.py ./file.py
Aliases already supported:
etr compare
explain-this-repo compare
Supported inputs
Both sides accept anything currently supported:
- GitHub repo
- local repo
- GitHub directory
- local directory
- GitHub file
- local file
Examples:
explainthisrepo compare owner/repo owner/repo
explainthisrepo compare ./dir ./other-dir
explainthisrepo compare owner/repo/path/file.py ./file.py
explainthisrepo compare owner/repo ./file.py
Supported pair types
- repo vs repo
- local repo vs local repo
- repo vs local repo
- repo vs file
- file vs file
- directory vs directory
- local dir vs GitHub dir
- GitHub file vs local file
- private repo vs public repo
- monorepo vs single-package repo
Output must adapt to what is actually comparable. No forced symmetry.
Core flow
input A + input B
→ normalize
→ resolve
→ extract signals (existing pipeline)
→ compare signals (new layer)
→ render report
Hard rule
Do not compare generated explanations.
Comparison must be:
- extract structured signals
- compare structured signals
- generate explanation from diff
Anything else introduces noise and inconsistency.
Canonical analysis model
Target layer
- repo
- local_repo
- github_repo
- directory
- file
Signal layer
Repos / directories:
- structure
- entrypoints
- manifests
- configs
- dependencies
- tech stack
- high-signal files
- architecture hints
Files:
- file type
- size
- imports
- exports
- symbols
- purpose
- logic shape
Interpretation layer
- framework
- runtime
- package manager
- app shape
- frontend vs backend
- monorepo vs single package
- CLI vs library vs app
- entry path
Diff layer
- same
- added
- removed
- changed
- stronger signal
- conflicting signal
- not comparable
Internal design
TargetResolver (existing)
Used twice. No change.
SignalExtractor (existing)
Must return structured data. Not just prose.
Comparator (new)
Takes two signal objects → returns structured diff.
Example:
{
"stack": {
"only_in_a": ["fastapi"],
"only_in_b": ["express"],
"shared": ["docker"]
},
"entrypoints": {
"a": ["app/main.py"],
"b": ["src/index.ts"]
}
}
ReportRenderer (existing + extension)
Renders comparison output.
Output
Default file:
COMPARE.md
Respect:
--output
Same behavior as existing commands.
Modes
Reuse:
- "--quick" → one-line verdict
- "--simple" → short comparison
- "--detailed" → structured diff
- default → full report
Do not reuse:
Report structure
# Compare Report
## Summary
## What Target A looks like
## What Target B looks like
## Similarities
## Differences
- Stack
- Entry points
- Structure
- Architecture
- File types
- Dependency shape
- App type
## What matters most
## Confidence and limits
Behavior rules
- Do not hallucinate missing signals
- Mark partial comparisons (e.g. file vs repo)
- Only compare overlapping domains
- Keep system deterministic
- LLM is optional and only for phrasing, not discovery
Optional scoring
{
"stack_similarity": 0.72,
"structure_similarity": 0.41,
"architecture_distance": 0.83,
"confidence": "high"
}
Implementation plan
- Add "compare" command routing
- Reuse resolver for both inputs
- Ensure structured signal output
- Build comparator (pure logic)
- Extend renderer
- Add tests
- Update docs
Test matrix
- repo vs repo
- local vs local
- repo vs local
- repo vs file
- file vs file
- dir vs dir
- local dir vs GitHub dir
- GitHub file vs local file
- private vs public repo
- monorepo vs single-package repo
Also:
- invalid input
- same target comparison
- bad paths
- output writing
- stdout mode
- no-LLM mode
Failure to avoid
Do not implement:
EXPLAIN(A) + EXPLAIN(B) → text diff
This is unstable and untrustworthy.
Correct approach:
signals → diff → explanation
Future
Design comparator to support multiple inputs:
explainthisrepo compare A B C
Not required now. Must not require rewrite later.
Constraint
This must remain:
- signal-first
- deterministic
- single pipeline
- minimal new surface area
If new logic starts duplicating extraction or branching per input type, the design has failed.
Add a "compare" command that compares two supported targets using the existing ExplainThisRepo pipeline.
No new pipeline. No duplicated logic.
Reuse:
Add only:
Why this matters
Right now the tool explains one codebase well.
The next real step is helping users answer:
This cannot be solved by diffing generated explanations. That approach is noisy and unreliable.
The comparison must be built on extracted signals.
CLI
Primary:
Examples:
Aliases already supported:
etr compare
explain-this-repo compare
Supported inputs
Both sides accept anything currently supported:
Examples:
Supported pair types
Output must adapt to what is actually comparable. No forced symmetry.
Core flow
input A + input B
→ normalize
→ resolve
→ extract signals (existing pipeline)
→ compare signals (new layer)
→ render report
Hard rule
Do not compare generated explanations.
Comparison must be:
Anything else introduces noise and inconsistency.
Canonical analysis model
Target layer
Signal layer
Repos / directories:
Files:
Interpretation layer
Diff layer
Internal design
TargetResolver (existing)
Used twice. No change.
SignalExtractor (existing)
Must return structured data. Not just prose.
Comparator (new)
Takes two signal objects → returns structured diff.
Example:
{ "stack": { "only_in_a": ["fastapi"], "only_in_b": ["express"], "shared": ["docker"] }, "entrypoints": { "a": ["app/main.py"], "b": ["src/index.ts"] } }ReportRenderer (existing + extension)
Renders comparison output.
Output
Default file:
COMPARE.md
Respect:
--output
Same behavior as existing commands.
Modes
Reuse:
Do not reuse:
--stack--mapReport structure
Behavior rules
Optional scoring
{ "stack_similarity": 0.72, "structure_similarity": 0.41, "architecture_distance": 0.83, "confidence": "high" }Implementation plan
Test matrix
Also:
Failure to avoid
Do not implement:
EXPLAIN(A) + EXPLAIN(B) → text diff
This is unstable and untrustworthy.
Correct approach:
signals → diff → explanation
Future
Design comparator to support multiple inputs:
explainthisrepo compare A B C
Not required now. Must not require rewrite later.
Constraint
This must remain:
If new logic starts duplicating extraction or branching per input type, the design has failed.