feat(codegen): add solar-codegen crate with MIR and EVM codegen#693
Draft
feat(codegen): add solar-codegen crate with MIR and EVM codegen#693
Conversation
This adds the solar-codegen crate which provides: **MIR (Mid-level IR) Structure:** - Core MIR index types (ValueId, InstId, BlockId, FunctionId) - MIR types (UInt(u16), Address, MemPtr, StoragePtr) - Value (SSA values: Inst, Arg, Immediate, Phi, Undef) and Immediate constants - Comprehensive InstKind enum (Arithmetic, Bitwise, Comparison, Memory, Storage, Environment, Calls, Control Flow, SSA Phi/Select) - BasicBlock and Terminator (Jump, Branch, Switch, Return, Revert, Stop, SelfDestruct) - Function structure with SSA value/instruction storage, entry block, attributes - Module as top-level container (functions, data segments, storage layout) - FunctionBuilder API for constructing MIR **Lowering (HIR → MIR):** - Main Lowerer context (contract iteration, storage slot allocation, function setup) - Expression lowering for literals, identifiers, binary/unary ops, calls, ternary - Statement lowering for variable declarations, blocks, loops, if-statements **Code Generation (MIR → EVM):** - Opcode enum and EvmCodegen struct - generate_dispatcher: Check calldata, dispatch to function by 4-byte selector - generate_inst: Push operands, emit EVM opcode based on InstKind - Stack management: push_value for PUSH opcodes, generate_terminator for JUMP/REVERT/RETURN
Implements --standard-json input mode to enable Foundry compatibility: - Parse standard JSON input from stdin - Handle import remappings by configuring FileResolver - Correct contract-to-source file mapping in output - Write sources to temp directory with proper base_path setup Amp-Thread-ID: https://ampcode.com/threads/T-019bbe07-a540-7123-b1de-e9cd40659193 Co-authored-by: Amp <amp@ampcode.com>
- Add LoopContext stack to track break/continue targets in nested loops - Implement proper function argument loading from calldata with CALLDATALOAD - Store mutable local variables in memory (offset 0x80+) using MLOAD/MSTORE - Function parameters still handled as SSA values for efficiency Amp-Thread-ID: https://ampcode.com/threads/T-019bbe07-a540-7123-b1de-e9cd40659193 Co-authored-by: Amp <amp@ampcode.com>
- Add StackScheduler with LoadArg operation for calldata argument loading - Implement memory load/store operations for mutable local variables - Add assembler module for bytecode emission - Improve stack model tracking and manipulation Amp-Thread-ID: https://ampcode.com/threads/T-019bbe07-a540-7123-b1de-e9cd40659193 Co-authored-by: Amp <amp@ampcode.com>
- Add MemoryLoad, MemoryStore instructions for local variable storage - Add CalldataLoad instruction for function argument access - Extend MIR builder with memory allocation helpers Amp-Thread-ID: https://ampcode.com/threads/T-019bbe07-a540-7123-b1de-e9cd40659193 Co-authored-by: Amp <amp@ampcode.com>
- Implement dataflow-based liveness analysis for register allocation - Add phi node elimination with parallel copy support - Fix compiler warnings with #[allow(dead_code)] annotations Amp-Thread-ID: https://ampcode.com/threads/T-019bbe07-a540-7123-b1de-e9cd40659193 Co-authored-by: Amp <amp@ampcode.com>
- Add tempfile, alloy-json-abi dependencies for standard-json mode - Add --standard-json CLI flag to opts - Update version reporting for Foundry compatibility Amp-Thread-ID: https://ampcode.com/threads/T-019bbe07-a540-7123-b1de-e9cd40659193 Co-authored-by: Amp <amp@ampcode.com>
- Add example programs for codegen testing - Add integration tests validating EVM bytecode output
- Add function_to_dot() and module_to_dot() for graphviz output - Format instructions and terminators with operand values - Color-coded edges for branch conditions (green=true, red=false) - Add --dot flag to compile example for easy CFG generation
- Convert while-with-break to if statement in stack scheduler - Use format string variables directly - Replace extend with append for vector ranges - Replace manual div_ceil with method call - Collapse nested if statements - Update help CLI test expectations for --standard-json flag
Record shape syntax was causing parsing issues with graphviz. Simple box nodes work correctly with left-aligned labels.
- Fix uninlined_format_args in display.rs - Fix manual_div_ceil in liveness.rs - Fix collapsible_if patterns across multiple files - Fix drain_collect and extend_with_drain issues - Fix field_reassign_with_default in standard_json.rs - Add type aliases to reduce type complexity - Fix never_loop warning in spill_excess_values - Add allow attributes for test harness modules - Fix unused variable in build.rs - Add unused_crate_dependencies allows for test targets Amp-Thread-ID: https://ampcode.com/threads/T-019bbe1b-6309-775e-9653-f407558bb00b Co-authored-by: Amp <amp@ampcode.com>
- Use stderr emitter so compilation errors are visible - Fix panic on emitted_errors check - Use box shape instead of record to fix graphviz parsing
…nsfer - Add keccak256(key, slot) computation for mapping storage access - Handle compound assignments (+=, -=, etc.) by reading current value first - Implement address.transfer() and address.send() with 2300 gas stipend - Fix CALL/STATICCALL/DELEGATECALL to track result value in scheduler - Create fresh ValueIds for CALL arguments to avoid stack reuse issues Fixes MemoryLimitOOG when running Advanced.sol tests with mappings and external calls (transfer).
…ution - Add can_emit_value() to check if a value is available for emission - Add instruction_executed_untracked() for values that become stale in loops - Fix builtin member resolution to preserve Builtin reference
- Handle type conversion calls (ICallee(addr), uint256(x)) that were returning 0 - Fix compute_member_selector to resolve type cast calls for external contracts - Add nested mapping slot computation for multi-level mappings (m[a][b]) - Support dynamic array storage access and .length member - Implement pre/post increment/decrement operators - Add builtin module member access (msg.sender, block.timestamp)
- Add generate_synthetic_constructor() to create constructor for contracts without one - Constructor emits SSTORE for each state variable with an initializer - Modify generate_deployment_bytecode() to run constructor code before CODECOPY/RETURN - Add generate_constructor_code() helper that strips trailing STOP from constructor bytecode State variables like `uint256 public value = 42` are now properly initialized.
- Implement lower_emit() in stmt.rs that computes event signature hash for topic0 - Handle indexed parameters as additional topics, non-indexed as ABI-encoded data - Add compute_event_signature() and type_to_abi_string() helpers - Add log0-log4 builder methods in builder.rs emit statements were previously no-ops, now properly emit LOG0-LOG4.
- Fix typo: UEUse -> upward-exposed uses and defs - Escape doc comments with brackets to prevent broken intra-doc links - Mark README code block as text to avoid arrow parsing errors - Add #[ignore] to foundry tests that require anvil/solc
…suite - Add testdata/foundry-tests with 6 test contracts and 21 test cases - Test contracts: Counter, Events, ExternalCall, StorageInit, Showcase, StackDeep - Rewrite test harness to use FOUNDRY_SOLC=solar and forge test - Use per-test output directories (--out, --cache-path) for parallel execution - Parses forge output and validates all tests pass
- Add payable check in function dispatcher: non-payable/view/pure
functions revert if called with ETH (CALLVALUE != 0)
- Support {value: X} call options for external calls
- Add extract_call_value helper to parse call options
- Add Payable.sol test contract and Payable.t.sol tests
The payable check is emitted in emit_payable_check() after the function
JUMPDEST, before generating the function body. For payable functions,
no check is emitted.
External calls now correctly pass the value from {value: X} options
to the CALL opcode instead of hardcoding 0.
Optimization passes for the Solar codegen: 1. Constant Folding (HIR-level): - Evaluates constant expressions at compile time - Handles binary ops, unary ops, ternary expressions - 15 unit tests covering arithmetic, bitwise, comparison ops 2. Dead Code Elimination (MIR-level): - Removes instructions whose results are never used - Preserves side-effect instructions (SSTORE, CALL, LOG, etc.) - Uses value use analysis to identify dead code Also adds InstKind::has_side_effects() helper for DCE correctness. Amp-Thread-ID: https://ampcode.com/threads/T-019bbfc4-4637-71ce-a483-63911a6290f5 Co-authored-by: Amp <amp@ampcode.com>
1. Peephole Optimizer (not yet integrated): - 15+ optimization patterns (PUSH0 ADD→nop, SWAP1 SWAP1→nop, etc.) - 17 unit tests passing - Not integrated into pipeline yet (breaks jump targets) - TODO: integrate at assembler level before label resolution 2. Dynamic Arrays - fix pop() bug: - Fixed StackUnderflow in pop() caused by reusing slot_val - Reorder operations to avoid consuming values twice - Added DynamicArray.sol and DynamicArray.t.sol (6 tests pass) Amp-Thread-ID: https://ampcode.com/threads/T-019bbfec-3633-7296-8139-120e925d8fb3 Co-authored-by: Amp <amp@ampcode.com>
- Split monolithic test into 6 parallel projects by category:
arithmetic, control-flow, storage, events, calls, stack-deep
- Each project runs both Solar and solc for comparison
- Reports compilation time, bytecode sizes, and per-test gas usage
- Separate out-{compiler}/ directories for artifacts
Test results (95 tests):
- Bytecode: Solar 61-72% smaller than solc
- Gas: Solar 2-48% cheaper on most operations
- Stack-deep: Solar compiles what solc cannot
Added edge case tests exposing 5 bugs (tests skipped, tasks created):
- Signed arithmetic (SDIV, SLT, SGT)
- Ternary operator
- Continue statement
- Storage pre/post increment
- Bitwise NOT
- Test contracts (.t.sol) always compiled with solc for reliable test logic - Contract-under-test bytecode injected via SOLAR_<NAME>_BYTECODE env vars - Tests deploy Solar bytecode when env var present, fallback to solc - Updated .gitignore to exclude out/ and cache/ directories - stack-deep tests marked #[ignore] (solc can't compile, Solar has bugs)
- Enable AVX2+FMA for x86_64 builds (.cargo/config.toml) - Optimize whitespace lexer with bulk position search (cursor/mod.rs) - Fix ICE #216: allow this/super builtins to be shadowed (resolve.rs) - Fix ICE #219: error/event params don't declare in scope (resolve.rs, hir/mod.rs) - Add UI test for this/super shadowing All 7,900+ tests passing.
- Collapse nested if statements using let-chains (clippy::collapsible_if) - Fix formatting in resolve.rs
This reverts commit b1cd328.
…vements" This reverts commit 1b6f8a6.
Bump Python dependencies in benches/analyze: - matplotlib: 3.10.1 → 3.10.8 - numpy: 2.2.4 → 2.4.1 - contourpy: 1.3.2 → 1.3.3 - fonttools: 4.57.0 → 4.61.1 - kiwisolver: 1.4.8 → 1.4.9 - packaging: 24.2 → 25.0 - pillow: 11.2.1 → 12.1.0 - pyparsing: 3.2.3 → 3.3.1
Automation to keep dependencies in `Cargo.lock` current.
<details><summary><strong>cargo update log</strong></summary>
<p>
```log
Locking 34 packages to latest compatible versions
Updating base64ct v1.8.2 -> v1.8.3
Updating cc v1.2.51 -> v1.2.53
Updating clap_lex v0.7.6 -> v0.7.7
Updating codspeed v4.2.0 -> v4.2.1
Updating codspeed-criterion-compat v4.2.0 -> v4.2.1
Updating codspeed-criterion-compat-walltime v4.2.0 -> v4.2.1
Updating colored v3.0.0 -> v3.1.1
Updating find-msvc-tools v0.1.6 -> v0.1.8
Unchanged generic-array v0.14.7 (available: v0.14.9)
Updating getrandom v0.2.16 -> v0.2.17
Updating indexmap v2.12.1 -> v2.13.0
Updating js-sys v0.3.83 -> v0.3.85
Updating libc v0.2.179 -> v0.2.180
Updating proc-macro2 v1.0.104 -> v1.0.105
Updating quote v1.0.42 -> v1.0.43
Updating rand_core v0.9.3 -> v0.9.5
Updating rapidhash v4.2.0 -> v4.2.1
Updating rustc-demangle v0.1.26 -> v0.1.27
Updating serde_json v1.0.148 -> v1.0.149
Unchanged slang_solidity v0.18.3 (available: v1.3.2)
Updating snapbox v0.6.23 -> v0.6.24
Unchanged solang-parser v0.3.4 (available: v0.3.5)
Updating syn v2.0.113 -> v2.0.114
Updating time v0.3.44 -> v0.3.45
Updating time-core v0.1.6 -> v0.1.7
Updating time-macros v0.2.24 -> v0.2.25
Updating tracy-client v0.18.3 -> v0.18.4
Updating tracy-client-sys v0.27.0 -> v0.28.0
Unchanged tree-sitter v0.25.8 (available: v0.25.10)
Unchanged tree-sitter-solidity v1.2.12 (available: v1.2.13)
Unchanged vergen v8.3.2 (available: v9.1.0)
Updating wasip2 v1.0.1+wasi-0.2.4 -> v1.0.2+wasi-0.2.9
Updating wasm-bindgen v0.2.106 -> v0.2.108
Updating wasm-bindgen-macro v0.2.106 -> v0.2.108
Updating wasm-bindgen-macro-support v0.2.106 -> v0.2.108
Updating wasm-bindgen-shared v0.2.106 -> v0.2.108
Updating wit-bindgen v0.46.0 -> v0.51.0
Updating zerocopy v0.8.31 -> v0.8.33
Updating zerocopy-derive v0.8.31 -> v0.8.33
Updating zmij v1.0.9 -> v1.0.15
note: to see how you depend on a package, run `cargo tree --invert <dep>@<ver>`
```
</p>
</details>
Co-authored-by: DaniPopes <57450786+DaniPopes@users.noreply.github.com>
- Add InheritedMapping test for storage slot inheritance - Add LocalSecondOperand test for arithmetic with local variables - Remove out-solar/ from git tracking (add to gitignore)
…ator The multi-value return loop was missing push_unknown() after emit_push() and incorrectly calling instruction_executed(1, None) instead of (2, None). MSTORE consumes 2 values (offset and value), not 1. This caused stack scheduler state to diverge from reality, leading to StackUnderflow at runtime for struct-returning functions. Fixes all 8 failing struct codegen tests.
Previously, member calls via `using` directives (e.g., `x.min(b)` where `using MathLib for uint256`) fell through to the external CALL fallback, generating incorrect bytecode that called address 0x0a. This fix: - Adds `UsingDirective` struct to HIR to store library bindings - Resolves using directives during AST lowering phase - Adds `current_contract_id` tracking in codegen Lowerer - Detects using directive calls in `lower_member_call_with_opts` - Routes them to `lower_library_call` with the receiver as bound first arg Test: LibraryUsing.t.sol tests (testMin, testSqrt, testComplex) now pass Amp-Thread-ID: https://ampcode.com/threads/T-019becea-3354-75f9-bb87-d4d6a346d14f Co-authored-by: Amp <amp@ampcode.com>
The scheduler's StackModel was drifting from the actual EVM stack during complex codegen sequences, causing incorrect DUP operations. Changes: - Add StackEffect struct to describe pops/pushes for each opcode - Add StackPush enum to specify tracked/untracked stack entries - Add emit_stack_op() helper for DUP/SWAP/POP with automatic model updates - Add emit_op_with_effect() helper for opcodes with known stack effects - Fix InstKind::Select to use per-operation stack tracking - Update CALL/STATICCALL/DELEGATECALL to use emit_op_with_effect The root cause was Select emitting 6 stack-mutating opcodes but only updating the scheduler once at the end, causing cumulative drift.
…onment opcodes - Fix Select instruction (ternary) stack manipulation: use DUP3,DUP3 instead of DUP2,DUP4 and SWAP1 instead of SWAP2 to correctly compute f + cond*(t-f) - Add handling for environment opcodes in emit_value_fresh: CallValue, Caller, Origin, CalldataSize, Timestamp, BlockNumber - these can be safely re-emitted - Move spill slot base from 0x100 to 0x1000 to avoid conflicts with dynamic memory allocations (structs, arrays)
Tests that calling between overloaded library functions works correctly (e.g., find(key) calling find(key, true)).
Tests for functions that take multiple struct parameters, which currently have issues with memory layout.
Implement solc-style backward layout analysis for better stack scheduling: - Add StackShuffler: converts source to target layout with minimal ops - Phase 1: Ensure multiplicities (DUP values needing copies) - Phase 2: Arrange positions (SWAP to correct slots) - Phase 3: Pop excess values - Add LayoutAnalysis: backward analysis through instructions - analyze_backward() computes ideal entry layout from exit - compute_entry_for_instruction() for single instruction - Add helper functions for ideal operand layouts - ideal_binary_op_entry/ideal_unary_op_entry - is_freely_generable() for literals/labels - Integration with StackScheduler - shuffle_to_layout(), prepare_binary_op(), prepare_unary_op() All 8 shuffler unit tests pass.
Add CFG simplification passes: 1. Block Merging (CfgSimplifier): - Merge A→B when A has single successor B and B has single predecessor A - Saves 8 gas per eliminated JUMP 2. Empty Block Elimination: - Remove blocks with only unconditional jump - Redirect predecessors to final target 3. Dead Function Elimination (DeadFunctionEliminator): - Build reachability from entry points (public/external/constructor) - Remove unreachable functions 4. Call Graph Analysis (CallGraphAnalyzer): - Build call graph for module - Detect recursive functions via DFS Includes CfgSimplifyStats for tracking optimization metrics. All 8 unit tests pass.
Add infrastructure for stack layout merging at control flow merge points: - BlockStackLayout: represents stack layouts with SmallVec slots - combine_stack_layouts(): computes common layout from predecessors - estimate_shuffle_cost(): estimates DUP/SWAP/POP operations needed StackScheduler block layout methods: - set/get_block_entry_layout: manage target layouts per block - record/get_block_exit_layout: track actual layouts - compute_merge_layout: compute entry from predecessor exits - shuffle_to_block_entry: shuffle current stack to target - init_from_block_entry_layout: initialize stack from expected 12 new unit tests for layout merging functionality. All 117 tests pass.
Partial implementation of nested struct handling: - calculate_memory_words_for_type: compute flattened memory layout - get_struct_field_memory_offset: get byte offset for nested fields - compute_nested_memory_struct_info: handle chained member access - copy_struct_storage_to_memory/memory_to_storage: recursive copy helpers - Fix variable declaration to allocate memory for uninitialized structs testNestedMemoryValue passes - nested memory struct access works. Cross-location copying needs further work.
Implement storage-to-memory and memory-to-storage copying for nested structs: - copy_storage_to_memory(): recursive copy from storage to memory - copy_memory_to_storage(): recursive copy from memory to storage - compute_nested_storage_slot_with_type(): recursive storage slot calculation - compute_nested_memory_struct_info_with_type(): recursive memory offset calc Supports arbitrarily deep nesting (3+ levels tested). Test files: - DeepNested.sol/t.sol: 3-level nested struct storage-memory round trip - DeepNestedSimple.sol/t.sol: 1/2/3-level storage access All 117 unit tests and 33 struct foundry tests pass.
khanbilal732
approved these changes
Jan 24, 2026
khanbilal732
approved these changes
Jan 24, 2026
khanbilal732
approved these changes
Jan 24, 2026
khanbilal732
approved these changes
Jan 24, 2026
khanbilal732
approved these changes
Jan 24, 2026
These instruction kinds can be re-emitted when their results are needed as CALL operands but aren't on the stack. This fixes ICE when compiling contracts that use keccak256 results in external calls (e.g., unifap-v2). 6/8 unifap-v2 tests now pass. Remaining 2 failures are runtime errors in setUp() - needs further investigation.
- Fix ExprKind::Ternary lowering to use proper branching instead of select() which only works for single values. Tuple ternaries now write to scratch memory and merge block reads values back. - Fix abi.encodePacked to properly pack values based on their types instead of 32-byte padding. Returns proper bytes memory format (length prefix + tightly packed data). - Add get_packed_size_from_expr/get_packed_size_from_hir_type for type-based size inference (address=20, bool=1, bytesN=N, etc.) - Update lower_return to handle ternary expressions returning tuples by reading all values from scratch memory. All 34 unifap-v2 tests now pass. Amp-Thread-ID: https://ampcode.com/threads/T-019bf223-95e1-7439-87e4-4ea6fff69ab2 Co-authored-by: Amp <amp@ampcode.com>
khanbilal732
approved these changes
Jan 29, 2026
khanbilal732
approved these changes
Jan 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds the
solar-codegencrate which provides MIR (Mid-level Intermediate Representation) and EVM bytecode generation for Solar.Test Results
95 tests pass across 6 parallel test suites comparing Solar vs solc:
Per-Project Results
Sample Output
Architecture
MIR Structure
ValueId,InstId,BlockId,FunctionIdindex typesUInt(u16),Int(u16),Address,MemPtr,StoragePtr,CalldataPtr,Function,Bool,FixedBytesInst,Arg,Immediate,Phi,Undef) withImmediateconstantsJump,Branch,Switch,Return,Revert,Stop,SelfDestruct,InvalidLowering (HIR → MIR)
Lowerercontext with storage slot allocation and function loweringCode Generation (MIR → EVM)
EvmCodegenstruct for bytecode generationCompleted Features
{value: X})Optimization Passes (implemented, not yet integrated)
transform/constant_fold.rs)transform/dce.rs)codegen/peephole.rs) - 15+ bytecode patternsKnown Issues (tests skipped, tracked in task list)
Remaining Work