Skip to content

Roadmap: Bytecode Equivalence Testing & New IR Design #687

@gakonst

Description

@gakonst

Summary

This issue outlines a roadmap for Solar to achieve bytecode generation parity with solc and eventually produce optimal EVM bytecode through a new IR design. This builds on and references the existing Roadmap #1.


Implementation Status

MIR Foundation (✅ Complete)

The MIR (Mid-level IR) structure has been implemented with:

  • SSA-based design with phi nodes
  • EVM-aware type system (UInt, Address, MemPtr, StoragePtr, etc.)
  • Basic block structure with terminators
  • FunctionBuilder API for construction
  • HIR → MIR lowering (basic expressions, some builtins)
  • MIR → EVM codegen skeleton (dispatcher, basic instructions)

Active Implementation Issues

Issue Title Priority Status
#694 Liveness Analysis High 🔴 Not started
#695 Phi Elimination Pass High 🔴 Not started
#696 Stack Model & Basic Scheduling (≤16 values) Critical 🔴 Not started
#697 Full Stack Scheduling with Spilling Critical 🔴 Blocked on #696
#698 Assembler Label & Jump Resolution High 🔴 Not started
#699 Complete HIR→MIR for Complex Types High 🔴 Not started
#700 Dead Code Elimination (DCE) Medium 🔴 Not started
#701 Constant Folding & Propagation Medium 🔴 Not started
#702 Sparse Conditional Constant Propagation (SCCP) Medium 🔴 Not started
#703 Common Subexpression Elimination (CSE) Medium 🔴 Not started
#704 Bytecode Equivalence Testing Infrastructure High 🔴 Not started

Dependency Graph

                    ┌─────────────┐
                    │ MIR (done)  │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ┌────────┐   ┌────────┐   ┌────────┐
         │ #694   │   │ #698   │   │ #699   │
         │Liveness│   │Assembler│  │Complex │
         └───┬────┘   └────────┘   │ Types  │
             │                      └────────┘
             ▼
         ┌────────┐
         │ #695   │
         │Phi Elim│
         └───┬────┘
             │
             ▼
         ┌────────┐
         │ #696   │◄──── CRITICAL PATH
         │Basic   │
         │Stack   │
         └───┬────┘
             │
             ▼
         ┌────────┐
         │ #697   │
         │Full    │
         │Stack + │
         │Spilling│
         └───┬────┘
             │
             ▼
    ┌────────────────────┐
    │ Working Bytecode!  │
    └────────────────────┘
             │
    ┌────────┴────────┐
    ▼                 ▼
┌────────┐       ┌────────┐
│ #700   │       │ #704   │
│ DCE    │       │Equiv   │
└────────┘       │Testing │
                 └────────┘

Phase 0: Complete Frontend (Current Focus)

Before codegen, finish the semantic analysis layer. Many of these are in progress or have PRs open.

Type Checking (#615)

HIR Improvements (#91)

LSP Support (#394)

Blocked on typeck completion:

Other Frontend


Phase 1: Closed-Loop Bytecode Equivalence Testing

Tracking issue: #704

Build a test harness that verifies Solar-generated bytecode is behaviorally equivalent to solc-generated bytecode.

Approach

  • Use Foundry property tests (fuzz testing) to compare runtime behavior
  • For each Solidity contract:
    1. Compile with solc → bytecode A
    2. Compile with Solar → bytecode B
    3. Deploy both to Anvil
    4. Run identical transactions/calls against both
    5. Assert: same return values, same state changes, same reverts, same gas (within tolerance)

Deliverables


Phase 2: New IR Design ✅ COMPLETE

Decision: Build custom IR inspired by Venom (Vyper) and Sonatina (Fe).

What was built

The MIR is located in crates/codegen/src/mir/ with:

Component Description
Module Top-level container with functions, data segments, storage layout
Function SSA function with basic blocks, values, instructions
BasicBlock Linear instruction sequence with terminator
Inst / InstKind All EVM operations + SSA-specific (Phi, Select)
Value SSA values: Inst result, Arg, Immediate, Phi, Undef
Type EVM-aware: UInt(bits), Address, MemPtr, StoragePtr, etc.
FunctionBuilder Convenient construction API

Design influences adopted

From Venom (Vyper):

From Sonatina (Fe):

  • Rust-native types and builder patterns
  • Dense entity IDs for values/instructions/blocks
  • Clean separation of concerns

Phase 3: HIR → IR → Bytecode Pipeline

Core Pipeline (Critical Path)

Step Issue Description
1 #694 Liveness analysis - which values are live at each point
2 #695 Phi elimination - convert SSA to move-based form
3 #696 Basic stack scheduling - map SSA to EVM stack (≤16 values)
4 #697 Full stack scheduling - handle >16 values with spilling
5 #698 Assembler - resolve labels and jumps to bytecode

Supporting Work

Issue Description
#699 Complete lowering for structs, arrays, mappings

Pipeline Visualization

Source (.sol)
    ↓
[Parser] → AST
    ↓
[Sema] → Typed HIR     ← Current Solar
    ↓
[Lowering] → MIR        ← Done (basic), #699 (complex types)
    ↓
[Phi Elim] → MIR'       ← #695
    ↓
[Stack Sched] → Scheduled MIR  ← #696, #697
    ↓
[Codegen] → EVM Assembly  ← Partially done
    ↓
[Assembler] → Bytecode    ← #698

Phase 4: Optimizations

Issue Optimization Description
#700 DCE Remove unused instructions
#701 Constant Folding Evaluate constant expressions at compile time
#702 SCCP Propagate constants through CFG, eliminate dead branches
#703 CSE Eliminate redundant computations

Optimization Pipeline Order

const_fold → sccp → dce → cse → dce

Future Optimizations (not yet tracked)

  • Function inlining
  • Loop optimizations
  • Storage access coalescing (batch SLOAD/SSTORE)
  • Memory layout optimization
  • Jump threading
  • Peephole optimizations

Success Criteria

  • Solar bytecode is smaller than solc (fewer bytes)
  • Solar bytecode uses less gas for common operations
  • Solar bytecode passes all equivalence tests from Phase 1

Performance Enhancements (Parallel Track)

These can proceed independently:


References

cc @gakonst @DaniPopes @onbjerg

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: code generation and MIRC-tracking-issueCategory: an issue that collects information about a broad development initiative

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions