-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Feature Request
Is your feature request related to a problem? Please describe:
No.
Describe the feature you'd like:
Implement comprehensive EVM bytecode support in DTVM to enable JIT execution of EVM smart contracts. This includes:
- Multi-bytecode Architecture: Extend DTVM to support both WASM and EVM bytecodes on the same underlying architecture
- EVM Frontend Integration:
- Add EVM bytecode parser and validator
- Implement EVM instruction set support (0x00-0xff opcodes)
- Create EVM-specific IR builder (EVMMirBuilder)
- Memory Model Support: Implement EVM's three-level memory model (stack, memory, storage)
- Gas Calculation System: Add instruction-level gas metering compatible with EVM gas model
- Control Flow Handling: Support EVM's jump-based control flow (JUMP/JUMPI) vs WASM's structured control flow
- Data Type Support: Handle u256 operations (EVM's native type) vs WASM's i32/i64/f32/f64
Describe alternatives you've considered:
- External EVM Implementation: Using a separate EVM implementation alongside DTVM, but this would lose the benefits of DTVM's optimized JIT compilation
- WASM Transpilation: Converting EVM bytecode to WASM, but this approach has significant overhead and complexity
- Interpretation Only: Implementing EVM support only in interpreter mode, but this would miss the performance benefits of JIT compilation
Teachability, Documentation, Adoption, Migration Strategy:
Differences Between EVM and WASM Bytecode
Execution Model
- EVM: Stack-based virtual machine
Instruction Set Differences
-
WASM instruction set: Register/local variable-based instruction set, such as
local.get,local.set, https://webassembly.github.io/spec/core/appendix/index-instructions.html -
EVM instruction set: Stack-based instruction set, such as
PUSH1,DUP1,SWAP1and other stack element operations. -
Control Flow Structure
-
WASM: Uses structured control flow (
block,loop,if) -
EVM: Uses
JUMPandJUMPIfor jumps ==> Arbitrary jumps in DMIR basic blocks
-
-
Memory instructions, such as mstore, need to consider EVM memory auto-growth during design, and gas consumption, with growth size limited by gas amount.
Function Differences
EVM has no functions, so the entire program is a whole, with non-sequential access implemented through jump instructions internally.
Data Type Differences
-
WASM: i32, i64, f32, f64
-
EVM: The basic type operated by opcodes is u256
Gas Differences
- WASM itself has no gas concept, typically charged by basic block for contracts; EVM deducts gas at instruction granularity.
Basic Modifications
Command Line Interface Extension
Extend compilation parameter options in src/cli/dtvm.cpp, support command line parameter --format to configure whether to parse file format as wasm bytecode or evm bytecode, and choose different file loading and parsing methods based on specific bytecode type.
// src/cli/dtvm.cpp
enum class InputFormat {
WASM,
EVM
};
int main(int argc, char *argv[]) {
/*...*/
CLIParser->add_option("INPUT_FILE", Filename, "input filename")
->required();
CLIParser->add_option("--format", ...);
/*...*/
}EVM-Compatible Compilation Interface
// src/compiler/compiler.h
class JITCompilerBase {
public:
// Add EVM compilation support
void compileEVMModule(runtime::Module& module);
protected:
// Internal implementation
void compileEVMFunction(EVMFrontendContext& context, uint32_t func_idx);
};Context Parsing
Current architecture: WASM bytecode → WASMByteCodeVisitor → FunctionMirBuilder → dMIR
Extended support architecture: EVM bytecode → EVMByteCodeVisitor → (EVM has no function concept, only construct one main function) → dMIR
Loading EVM Bytecode
Extend runtime-related interfaces to be compatible with EVM bytecode.
// src/runtime/codeholder.h
// Extend class CodeHolder, add related methods
class CodeHolder : public RuntimeObject<CodeHolder> {
public:
enum class HolderKind {
kFile,
kRawData,
kEVMBytecode // EVM bytecode holder type
};
// Add EVM-specific constructor
static CodeHolderUniquePtr newEVMBytecodeHolder(Runtime &RT, const void *Data, size_t Size);
// Check if it's EVM bytecode
bool isEVMBytecode() const { return Kind == HolderKind::kEVMBytecode; }
private:
void releaseEVMBytecodeHolder();
};// src/runtime/module.h
struct EVMMetadata {
std::vector<uint8_t> bytecode;
std::unordered_set<size_t> jump_targets;
std::vector<action::EVMInstruction> instructions;
bool is_evm_module = false;
};
class Module {
public:
// Add EVM support
void setEVMMetadata(const EVMMetadata& metadata) {
evm_metadata = metadata;
evm_metadata.is_evm_module = true;
}
const EVMMetadata& getEVMMetadata() const {
return evm_metadata;
}
bool isEVMModule() const {
return evm_metadata.is_evm_module;
}
private:
EVMMetadata evm_metadata;
};// src/runtime/runtime.h
class Runtime {
public:
MayBe<Module *> loadEVMModule(const std::string &Filename, common::RunMode mode = common::RunMode::MultipassJIT) noexcept;
MayBe<Module *> loadEVMModule(const std::string &ModName, const void *Data, size_t Size, common::RunMode mode = common::RunMode::MultipassJIT) noexcept;
};Parsing EVM Bytecode
The implementation in src/action/module_loader.h parses wasm binary bytecode. Based on EVM binary bytecode characteristics, we need to add EVMModuleLoader type to parse EVM bytecode.
// src/action/evm_module_loader.h
// EVM instruction structure
struct EVMInstruction {
uint8_t opcode;
std::vector<uint8_t> data;
size_t pc;
size_t size;
};
class EVMModuleLoader {
public:
using Byte = common::Byte;
explicit EVMModuleLoader(runtime::Module &mod, const Byte *data, size_t size)
: mod(mod), data(data), size(size), ptr(data), end(data + size) {}
void load();
private:
uint8_t readByte() {
if (ptr >= end) {
throw common::getError(common::ErrorCode::UnexpectedEnd);
}
return *ptr++;
}
std::vector<uint8_t> readBytes(size_t n) {
if (ptr + n > end) {
throw common::getError(common::ErrorCode::UnexpectedEnd);
}
std::vector<uint8_t> result(ptr, ptr + n);
ptr += n;
return result;
}
// EVM bytecode parsing methods
EVMInstruction parseInstruction();
void parseAllInstructions();
void analyzeJumpTargets();
void createMainFunction();
runtime::Module &mod;
const Byte *data;
size_t size;
const Byte *ptr;
const Byte *end;
std::vector<EVMInstruction> instructions;
std::unordered_set<size_t> jump_targets;
};Visiting EVM Bytecode
Reference template <typename IRBuilder> class WASMByteCodeVisitor in src/action/bytecode_visitor.h
// src/action/evm_bytecode_visitor.h
// EVM stack management
template <typename Operand>
class EVMEvalStack {
public:
void push(Operand op) { stack_impl.push(op); }
Operand pop() {
ZEN_ASSERT(!stack_impl.empty());
Operand top = stack_impl.top();
stack_impl.pop();
return top;
}
Operand peek(size_t depth = 0) const {
ZEN_ASSERT(depth < stack_impl.size());
auto it = stack_impl._Get_container().rbegin() + depth;
return *it;
}
size_t size() const { return stack_impl.size(); }
bool empty() const { return stack_impl.empty(); }
private:
std::stack<Operand> stack_impl;
};
// EVM bytecode visitor
template <typename IRBuilder>
class EVMByteCodeVisitor {
typedef typename IRBuilder::CompilerContext CompilerContext;
typedef typename IRBuilder::Operand Operand;
typedef EVMEvalStack<Operand> EvalStack;
public:
EVMByteCodeVisitor(IRBuilder& builder, CompilerContext* ctx)
: builder(builder), ctx(ctx) {}
bool compile() {
builder.initFunction(ctx);
bool ret = processInstructions();
builder.finalizeFunctionBase();
return ret;
}
private:
void push(Operand opnd) { stack.push(opnd); }
Operand pop() {
Operand opnd = stack.pop();
builder.releaseOperand(opnd);
return opnd;
}
bool processInstructions() {
const auto& instructions = ctx->getEVMInstructions();
const auto& jump_targets = ctx->getEVMJumpTargets();
// Create entry basic block
builder.createBasicBlock(0);
// Create basic block for each jump target
for (size_t pc : jump_targets) {
builder.createBasicBlock(pc);
}
// Process all instructions
for (const auto& inst : instructions) {
// Check if it's a jump target
if (jump_targets.count(inst.pc) > 0) {
builder.setInsertBlock(inst.pc);
}
// Process instruction
processInstruction(inst);
}
return true;
}
void processInstruction(const zen::action::EVMInstruction& inst) {
switch (inst.opcode) {
case 0x00: handleStop(); break; // STOP
case 0x01: handleAdd(); break; // ADD
// ... other arithmetic instructions ...
// ... other comparison instructions ...
// ... other memory instructions ...
// PUSH instructions (0x60-0x7f)
// DUP instructions (0x80-0x8f)
// SWAP instructions (0x90-0x9f)
// ... other instructions ...
default:
handleUnsupportedOpcode(inst.opcode);
}
}
// Instruction handling functions
void handleStop() {
builder.handleStop();
}
// ... other instruction handling functions ...
private:
IRBuilder& builder;
CompilerContext* ctx;
EvalStack stack;
};Parsing Validation and Exception Handling
- Bytecode size not exceeding 24KB: If bytecode is too large, fallback to interpreter mode to avoid overly large dMIR functions that are difficult to test boundaries.
- Opcode validity (0x00-0xff): Invalid opcodes need prompts but don't affect subsequent bytecode parsing.
- Operand completeness (PUSH1-PUSH32 must have sufficient subsequent bytes): Record incomplete operands but don't exit abnormally. Parse needs to consider cases where there aren't enough subsequent parameters.
- JUMP/JUMPI instruction jump targets must be JUMPDEST opcodes, otherwise report errors during execution phase.
- Jump targets or after JUMPDEST must be at instruction beginning, not in the middle of instructions: Invalid jump targets fallback to interpreter mode.
- Overly simple instructions or bytecode with frequent Host calls can fallback to interpreter mode to avoid JIT overhead of saving and restoring state when calling Host functions.
// src/action/evm_bytecode_validator.h
// EVM validation result
enum class EVMValidationResult {
Valid, // Bytecode valid
TooLarge, // Bytecode too large
InvalidOpcode, // Invalid opcode
IncompleteOperand, // Incomplete operand
InvalidJumpTarget, // Invalid jump target
CompilationError // Compilation error
};
// EVM instruction structure
struct EVMInstruction {
uint8_t opcode;
std::vector<uint8_t> data;
size_t pc;
size_t size;
};
class EVMBytecodeValidator {
public:
using Byte = common::Byte;
using RunMode = common::RunMode;
EVMBytecodeValidator(const Byte* data, size_t size)
: data(data), size(size), ptr(data), end(data + size),
run_mode(RunMode::InterpMode) {}
// Execute all validations
EVMValidationResult validate();
// Get validated jump targets
const std::unordered_set<size_t>& getJumpTargets() const { return jump_targets; }
// Get validated instruction list
const std::vector<EVMInstruction>& getInstructions() const { return instructions; }
private:
// Various validation methods
EVMValidationResult validateSize();
EVMValidationResult validateOpcodes();
EVMValidationResult validateOperands();
EVMValidationResult validateJumpTargets();
// Parse instructions
bool parseInstructions();
// Member variables
const Byte* data;
size_t size;
const Byte* ptr;
const Byte* end;
std::vector<EVMInstruction> instructions;
std::unordered_set<size_t> jump_targets;
std::unordered_set<size_t> jump_destinations; // JUMPDEST positions
RunMode run_mode; // Run mode: interpreter/singlepass/multipass
};Adding Context Content
// src/compiler/evm_frontend/evm_mir_compiler.h
class EVMFrontendContext final : public CompileContext {
public:
EVMFrontendContext(const runtime::Module& mod)
: CompileContext(), module(mod) {}
const std::vector<zen::action::EVMInstruction>& getEVMInstructions() const;
const std::unordered_set<size_t>& getEVMJumpTargets() const;
const std::vector<uint8_t>& getEVMBytecode() const;
private:
const runtime::Module& module;
};Implementing EVM IR Builder
Compare class FunctionMirBuilder in src/compiler/wasm_frontend/wasm_mir_compiler.h, implement EVMMirBuilder. Consider differences between wasm bytecode and EVM bytecode in data types, control flow instructions, stack operations, etc., and create new EVMMirBuilder.
Additionally, considering the gas calculation mechanism, to reuse gas instructions in current dMIR as much as possible, we need to insert gas checking and gas deduction instructions when generating dMIR.
// src/compiler/evm_frontend/evm_mir_builder.h
class EVMMirBuilder {
public:
typedef EVMFrontendContext CompilerContext;
// Operand definition
class Operand {
public:
Operand() = default;
Operand(MInstruction* instr, MType* type)
: instr(instr), type(type) {}
Operand(MConstant* constant, MType* type)
: constant(constant), type(type) {}
MInstruction* getInstr() const { return instr; }
MConstant* getConstant() const { return constant; }
MType* getType() const { return type; }
bool isEmpty() const { return !instr && !constant && !type; }
bool isConstant() const { return constant != nullptr; }
private:
MInstruction* instr = nullptr;
MConstant* constant = nullptr;
MType* type = nullptr;
};
EVMMirBuilder(CompilerContext& context, MFunction& mfunc);
void initFunction(CompilerContext* context);
void finalizeFunctionBase();
// Basic block management
void createBasicBlock(size_t pc);
void setInsertBlock(size_t pc);
// Instruction handling
void handleStop();
Operand handleAdd(Operand lhs, Operand rhs);
Operand handleMul(Operand lhs, Operand rhs);
// ... other instruction handling ...
Operand handlePush(const uint256_t& value);
Operand handleDup(Operand opnd);
void handleSwap(Operand top, Operand target);
// ... other instruction handling ...
// Helper functions
void releaseOperand(Operand opnd) {} // Empty implementation, compatible interface
private:
// Helper methods
MInstruction* extractOperand(const Operand& opnd);
Operand createTempOperand(MType* type);
CompilerContext& context;
MFunction& mfunc;
MBasicBlock* current_block;
std::unordered_map<size_t, MBasicBlock*> pc_to_block; // Map for handling EVM jump targets
};Instruction Support
MIR's handling of wasm instruction set is located in src/compiler/mir/instruction.h and src/compiler/mir/instructions.h. Extend support for evm-related instructions, therefore add src/compiler/mir/evm_opcodes.def to add EVM opcode support.
// src/compiler/mir/evm_opcodes.def
// EVM opcode definitions
// Format: DEFINE_EVM_OPCODE(operation_name)
// 0x00: Stop and arithmetic operations
DEFINE_EVM_OPCODE(stop) // Stop execution
DEFINE_EVM_OPCODE(add) // Addition operation
// Other arithmetic instructions
// 0x10: Comparison and bitwise operations
// 0x20: SHA3
// 0x30: Environment information address, balance, origin, caller, callvalue
// 0x50-0x5f: Stack, memory, storage and flow control: pop, mload, mstore, mstore8, sload, sstore, jump, jumpi, pc, msize, gas, jumpdest
// 0x60-0x7f: PUSH operations (PUSH1-PUSH32)
// 0x80-0x8f: DUP operations (DUP1-DUP16)
// 0x90-0x9f: SWAP operations (SWAP1-SWAP16)
// 0xa0-0xa4: Log operations (LOG0-LOG4)
// 0xf0-0xff: System operations: create, call, callcode, return, deleagatecall, create2, staticcall, revert, invalid, selfdestruct
Memory and Storage Model
- EVM's three-level model: stack, memory, storage
- Memory and storage mapping: Implement efficient memory structures suitable for u256 key-value pairs
- ABI modification: Manage registers, stack frames, etc.
EVMModule Interface Design
EVMModule is responsible for representing the static structure of EVM contracts and bytecode instructions.
class EVMModule {
public:
// Construction and initialization
explicit EVMModule(runtime::Runtime& runtime);
~EVMModule();
// Load bytecode information
bool loadBytecode(const Byte* data, size_t size);
private:
// Bytecode storage
std::vector<Byte> bytecode;
};EVMInstance Interface Design
EVMInstance is responsible for executing EVM bytecode and managing execution state
class EVMInstance {
public:
// Construction and initialization
explicit EVMInstance(EVMModule& module, uint64_t gasLimit);
~EVMInstance();
// Execution related
ExecutionResult execute(const std::string& functionName,
const std::vector<TypedValue>& args);
ExecutionResult executeRaw(const Bytes& calldata, uint256 value = 0);
// State queries
uint64_t getGasRemaining() const;
uint64_t getGasUsed() const;
bool hasError() const;
Error getLastError() const;
// Result retrieval
std::vector<TypedValue> getReturnData() const;
// Memory management interface - EVM memory model related
// Linear memory management: write, read
void memoryStore(uint32_t offset, const Bytes& data);
Bytes memoryLoad(uint32_t offset, uint32_t size);
// Storage management: write, read, check
void storageStore(const uint256& key, const uint256& value);
uint256 storageLoad(const uint256& key);
bool hasStorageKey(const uint256& key) const;
// Stack operations
void stackPush(const uint256& value);
uint256 stackPop();
uint256 stackPeek(uint32_t depth = 0) const;
uint32_t getStackSize() const;
// Call related
ExecutionResult call(Address target, uint256 value, const Bytes& calldata, uint64_t gas);
ExecutionResult delegateCall(Address target, const Bytes& calldata, uint64_t gas);
ExecutionResult staticCall(Address target, const Bytes& calldata, uint64_t gas);
void returnData(const Bytes& data);
void revert(const Bytes& data);
void stop();
// Transaction context information
Address getAddress() const;
Address getCaller() const;
uint256 getCallValue() const;
uint64_t getBlockNumber() const;
uint64_t getTimestamp() const;
private:
// Execution state
enum class ExecutionState {
Running,
Returned,
Reverted,
Error
};
ExecutionState state;
// EVM memory model components
std::vector<uint8_t> memory;
std::unordered_map<uint256, uint256> storage;
std::vector<uint256> stack;
// Execution environment
struct CallContext {
Address caller;
Address address;
uint256 value;
uint32_t pc;
std::vector<uint256> stack;
std::vector<uint8_t> memory;
std::vector<uint8_t> returnData;
uint64_t gasLimit;
uint64_t gasUsed;
};
std::vector<CallContext> callStack;
CallContext& currentContext();
// Gas management
uint64_t gasLimit;
uint64_t gasUsed;
bool consumeGas(uint64_t amount);
// Instruction execution
void executeNextInstruction();
void executeInstruction(uint8_t opcode);
};Gas Calculation Mechanism
- Gas metering system:
- Calculate gas consumption at instruction granularity, including fixed gas and dynamic gas consumption calculation
- When EVM opcodes generate dMIR, insert gas checking and gas deduction instructions, reusing gas instructions in current dMIR
// Example dMIR pseudo-code generated by EVM addition
// EVM stack operation: pop two operands, add them, push result back to stack
// Assume current dMIR builder instance is 'builder', stack simulator is 'stack'
// 1. Pop two values from stack top. Should actually pop multiple 64-bit values, but simplified as one 256-bit value in pseudo-code
MOperand op2 = builder.createPopOp(); // Second operand (stack top)
MOperand op1 = builder.createPopOp(); // First operand (second from top)
// 2. Create multiple dMIR instructions to implement 256-bit addition
MInstruction *addInstr = builder.createInstruction<BinaryInstruction>(
...
);
// 3. Push result back to stack
builder.createPushOp(addInstr);
// 4. Consume gas
// Create a constant representing gas consumption (G_verylow = 3)
MInstruction *gasCost = builder.createIntConstInstruction(
&builder.I64Type, // Gas represented as 64-bit integer
3 // G_verylow = 3
);
// Create gas consumption instruction
builder.createInstruction<GasInstruction>(
/*isTerminator=*/false,
/*opcode=*/OP_evm_gasuse,
/*gasAmount=*/gasCost
);
// Update stack simulation state (for subsequent instruction analysis)
stack.pop(); // Pop two values
stack.pop();
stack.push(addInstr); // Push result
Other Parts
JIT Implementation in trace RPC Interface
The trace RPC interface needs some data during execution, but if it only needs storage/event/sub-call changes, this interface might also be usable with JIT.