diff --git a/backends/generators/binutils/README.md b/backends/generators/binutils/README.md
new file mode 100644
index 0000000000..7f6b64ba0c
--- /dev/null
+++ b/backends/generators/binutils/README.md
@@ -0,0 +1,154 @@
+# Binutils RISC-V Generator
+
+This generator creates binutils-compatible opcode table entries from RISC-V UDB instruction definitions, following the format used in `binutils-gdb/opcodes/riscv-opc.c`.
+
+## Generated Files
+
+The generator produces two files for every run:
+
+### 1. Opcode Table (`.c` file)
+- **Format**: `{output_name}.c` 
+- **Purpose**: Contains the `riscv_opcodes[]` array with instruction definitions
+- **Structure**: Each entry follows binutils format: `{name, xlen, insn_class, operands, MATCH, MASK, match_func, pinfo}`
+- **Example**: `{"add", 0, INSN_CLASS_I, "d,s,t", MATCH_ADD, MASK_ADD, match_opcode, 0}`
+
+### 2. Header File (`.h` file)
+- **Format**: `{output_name}.h`
+- **Purpose**: Contains `#define` constants and custom instruction class definitions
+- **Contents**:
+  - `MATCH_*` constants for instruction matching
+  - `MASK_*` constants for instruction masking
+  - Custom `INSN_CLASS_*` definitions (commented out for manual addition to binutils)
+
+## Architecture
+
+### Single Source of Truth
+All instruction class mappings are centralized in `insn_class_config.py`:
+
+- **`BUILTIN_CLASSES`**: Maps extensions to existing binutils instruction classes
+- **`BUILTIN_COMBINATIONS`**: Maps complex extension combinations to binutils classes
+- **`is_builtin_class()`**: Determines if a class already exists in binutils
+
+### Extension Mapping
+The `ExtensionMapper` class handles UDB `definedBy` specifications:
+
+- **Simple extensions**: Direct 1:1 mapping (e.g., `Zba` → `INSN_CLASS_ZBA`)
+- **Complex combinations**: 
+  - `anyOf` → `*_OR_*` classes (e.g., `[Zbb, Zbkb]` → `INSN_CLASS_ZBB_OR_ZBKB`)
+  - `allOf` → `*_AND_*` classes (e.g., `[Zcb, Zba]` → `INSN_CLASS_ZCB_AND_ZBA`)
+- **Custom extensions**: Auto-generates class names (e.g., `Zfoo` → `INSN_CLASS_ZFOO`)
+
+### Operand Mapping
+The `OperandMapper` class converts UDB assembly format to binutils operand strings:
+
+- Maps register operands (e.g., `xd` → `d`, `xs1` → `s`)
+- Handles immediate operands (e.g., `imm` → `j`)
+- Marks unknown patterns as `NON_DEFINED_*`
+
+## Usage
+
+### Basic Usage
+```bash
+python3 binutils_generator.py --extensions=I,M,Zba,Zbb --output=my_opcodes.c
+```
+
+### Command Line Options
+- `--inst-dir`: Directory containing instruction YAML files (default: `../../../spec/std/isa/inst/`)
+- `--output`: Output C file name (corresponding .h file generated automatically)
+- `--extensions`: Comma-separated list of enabled extensions
+- `--arch`: Target architecture (`RV32`, `RV64`, `BOTH`)
+- `--include-all` / `-a`: Include all instructions, ignoring extension filtering
+- `--verbose` / `-v`: Enable verbose logging
+
+### Examples
+```bash
+# Generate for specific extensions
+python3 binutils_generator.py --extensions=I,M,A,F,D --output=rv64_core.c
+
+# Generate all instructions
+python3 binutils_generator.py --include-all --output=complete_riscv.c
+
+# Custom extension
+python3 binutils_generator.py --extensions=I,MyCustomExt --output=custom.c
+```
+
+## Integration with Binutils
+
+### Adding Custom Instruction Classes
+
+1. **Review generated header file**: Check the "Custom instruction class definitions" section
+2. **Add to binutils enum**: Edit `binutils-gdb/include/opcode/riscv.h`
+   ```c
+   enum riscv_insn_class
+   {
+     // ... existing classes ...
+     INSN_CLASS_ZFOO,           // Add your custom classes here
+     INSN_CLASS_I_OR_ZILSD,
+     // ...
+   };
+   ```
+
+3. **Add subset support**: Edit `binutils-gdb/bfd/elfxx-riscv.c` to handle extension requirements
+   ```c
+   static bool
+   riscv_multi_subset_supports (riscv_parse_subset_t *rps,
+                                enum riscv_insn_class insn_class)
+   {
+     switch (insn_class)
+     {
+       // ... existing cases ...
+       case INSN_CLASS_ZFOO:
+         return riscv_subset_supports (rps, "zfoo");
+       // ...
+     }
+   }
+   ```
+
+### Adding Generated Opcodes
+
+1. **Include header**: Add `#include "my_opcodes.h"` to your opcode file
+2. **Merge opcode arrays**: 
+   - Option A: Replace existing `riscv_opcodes[]` in `opcodes/riscv-opc.c`
+   - Option B: Create separate opcode table and modify binutils to use it
+   - Option C: Append entries to existing table
+
+### File Locations in Binutils
+- **Instruction classes**: `include/opcode/riscv.h` (enum `riscv_insn_class`)
+- **Opcode tables**: `opcodes/riscv-opc.c` (`riscv_opcodes[]` array)
+- **Extension support**: `bfd/elfxx-riscv.c` (`riscv_multi_subset_supports()`)
+- **Operand parsing**: `opcodes/riscv-dis.c` and `gas/config/tc-riscv.c`
+
+## Extending the Generator
+
+### Adding New Extensions
+```python
+from extension_mapper import ExtensionMapper
+
+mapper = ExtensionMapper()
+mapper.add_simple_mapping('Zfoo', 'INSN_CLASS_ZFOO')
+mapper.add_complex_mapping('anyOf', ['Zfoo', 'Zbar'], 'INSN_CLASS_ZFOO_OR_ZBAR')
+```
+
+### Custom Operand Mappings
+Edit `operand_mapper.py` to add support for new operand patterns.
+
+### Configuration
+Edit `insn_class_config.py` to modify built-in extension mappings.
+
+## Output Statistics
+
+The generator provides detailed statistics:
+- **Total instructions**: Number of instructions processed
+- **Successfully processed**: Instructions with complete mappings
+- **Non-defined operands**: Instructions with unknown operand patterns
+- **Non-defined extensions**: Instructions with unknown extensions (should be 0 with current design)
+- **Custom classes**: Number of custom instruction classes generated
+
+## Validation
+
+Use `validate_output.py` to compare generated output against reference binutils opcodes:
+```bash
+python3 validate_output.py reference.c generated.c
+```
+
+The validator provides detailed comparison including instruction class, operands, and MATCH/MASK values.
\ No newline at end of file
diff --git a/backends/generators/binutils/binutils_generator.py b/backends/generators/binutils/binutils_generator.py
new file mode 100644
index 0000000000..f6ce796830
--- /dev/null
+++ b/backends/generators/binutils/binutils_generator.py
@@ -0,0 +1,530 @@
+#!/usr/bin/env python3
+"""
+Binutils RISC-V Generator
+
+Generates binutils-compatible opcode table entries from RISC-V UDB instruction definitions.
+Follows the format used in binutils-gdb/opcodes/riscv-opc.c
+"""
+
+import os
+import sys
+import argparse
+import logging
+import yaml
+import glob
+
+# Add parent directory to path to find generator.py
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from generator import parse_match, parse_extension_requirements
+from naming_config import USER_DEFINED_INSN_NAMES, USER_DEFINED_OPERAND_PREFERENCES, is_user_defined_class
+import re
+
+logging.basicConfig(level=logging.INFO, format="%(levelname)s:: %(message)s")
+
+
+# Inline minimal mappers to keep only three files in this toolchain
+
+class ExtensionMapper:
+    def __init__(self) -> None:
+        self._defaults_used = []  # list[(ext, class_name)]
+        self._records = []        # list[(ext, class_name)]
+
+    def map_extension(self, defined_by, instruction_name: str = "") -> str:
+        if isinstance(defined_by, str):
+            ext = defined_by
+            if ext in USER_DEFINED_INSN_NAMES:
+                class_name = USER_DEFINED_INSN_NAMES[ext]
+            else:
+                class_name = f"INSN_CLASS_{ext.upper()}"
+                self._defaults_used.append((ext, class_name))
+            self._records.append((ext, class_name))
+            return class_name
+        if isinstance(defined_by, dict):
+            base = instruction_name or "custom"
+            class_name = f"INSN_CLASS_{base.upper()}"
+            self._defaults_used.append((base, class_name))
+            self._records.append((base, class_name))
+            return class_name
+        class_name = "INSN_CLASS_I"
+        self._records.append(("I", class_name))
+        return class_name
+
+    def get_defaults_warning(self) -> str:
+        if not self._records:
+            return ""
+        lines = []
+        lines.append("\n============================================================")
+        lines.append("WARNING: Default instruction class names were generated")
+        lines.append("============================================================")
+        lines.append("The following extensions used auto-generated names:\n")
+        for ext, cls in self._records:
+            lines.append(f"  • Extension '{ext}' -> {cls}")
+        lines.append("\nTo define custom names, add them to USER_DEFINED_INSN_NAMES in naming_config.py")
+        lines.append("============================================================")
+        return "\n".join(lines)
+
+
+class OperandMatch:
+    def __init__(self, binutils_char, info, score, reasons):
+        self.binutils_char = binutils_char
+        self.binutils_info = info
+        self.score = score
+        self.match_reasons = reasons
+
+
+class OperandMatcher:
+    def __init__(self, binutils_parser):
+        self.parser = binutils_parser
+        self.match_suggestions = {}
+
+    def _parse_bit_range(self, location_str):
+        m = re.match(r'^(\d+)-(\d+)$', location_str)
+        if m:
+            high = int(m.group(1)); low = int(m.group(2))
+            return (low, high)
+        ranges = []
+        for part in location_str.split(','):
+            if '-' in part:
+                m = re.match(r'^(\d+)-(\d+)$', part)
+                if m:
+                    endb = int(m.group(1)); startb = int(m.group(2))
+                    ranges.append((startb, endb))
+            else:
+                b = int(part)
+                ranges.append((b, b))
+        if ranges:
+            ranges.sort(key=lambda x: x[1]-x[0], reverse=True)
+            return ranges[0]
+        return (0, 31)
+
+    def find_matches(self, name, location_str):
+        if not self.parser.parsed:
+            return []
+        low, high = self._parse_bit_range(location_str)
+        matches = []
+        for char, info in self.parser.get_all_operands().items():
+            if info.bit_start == low and info.bit_end == high:
+                matches.append(OperandMatch(char, info, 1.0, [f"Exact bit match ({low}-{high})"]))
+        return matches
+
+    def suggest_operand_mapping(self, operand_name, location_str):
+        key = (operand_name, location_str)
+        if key in USER_DEFINED_OPERAND_PREFERENCES:
+            return USER_DEFINED_OPERAND_PREFERENCES[key]
+        cache_key = f"{operand_name}({location_str})"
+        if cache_key in self.match_suggestions:
+            return None
+        matches = self.find_matches(operand_name, location_str)
+        if not matches:
+            suggestion = f"no_match_{operand_name}_{location_str.replace('-', '_').replace(',', '_')}"
+            logging.warning(f"No exact bit match found for UDB '{operand_name}({location_str})' → using '{suggestion}'")
+            self.match_suggestions[cache_key] = suggestion
+            return suggestion
+        if len(matches) == 1:
+            m = matches[0]
+            logging.info(f"Auto-mapped UDB '{operand_name}({location_str})' → binutils '{m.binutils_char}' (exact bit match)")
+            self.match_suggestions[cache_key] = m.binutils_char
+            return m.binutils_char
+        # Multiple matches: prefer user config; otherwise first
+        preferred = USER_DEFINED_OPERAND_PREFERENCES.get(key)
+        choice = preferred or matches[0].binutils_char
+        return choice
+
+
+class OperandMapper:
+    def __init__(self, binutils_path: str | None = None):
+        self.variable_map = {
+            ('xd', '11-7'): 'd', ('xs1', '19-15'): 's', ('xs2', '24-20'): 't',
+            ('fd', '11-7'): 'D', ('fs1', '19-15'): 'S', ('fs2', '24-20'): 'T',
+            ('imm', '31-20'): 'j', ('imm', '31-25,11-7'): 'o', ('imm', '31,7,30-25,11-8'): 'p', ('imm', '31,19-12,20,30-21'): 'a',
+            ('shamt', '25-20'): '>', ('shamt', '24-20'): '<',
+        }
+        self.binutils_parser = None
+        self.operand_matcher = None
+        if binutils_path:
+            from binutils_parser import BinutilsParser
+            self.binutils_parser = BinutilsParser(binutils_path)
+            if self.binutils_parser.parse_operand_definitions():
+                self.operand_matcher = OperandMatcher(self.binutils_parser)
+                logging.info("Dynamic operand matching enabled with binutils source")
+            else:
+                logging.warning("Could not parse binutils source, using static mappings only")
+        else:
+            logging.info("No binutils path provided, using static mappings only")
+
+    def map_assembly(self, assembly_str, instr_info):
+        if not assembly_str or not assembly_str.strip():
+            return ""
+        variables = self._extract_variables(instr_info)
+        parts = self._parse_assembly(assembly_str)
+        out = []
+        for comp in parts:
+            out.append(self._map_single_operand(comp, variables, instr_info))
+        return ",".join(out)
+
+    def _extract_variables(self, instr_info):
+        variables = {}
+        encoding = instr_info.get("encoding", {})
+        if isinstance(encoding, dict):
+            var_list = encoding.get("variables", [])
+            if not var_list and "RV64" in encoding:
+                rv64 = encoding.get("RV64", {})
+                if isinstance(rv64, dict):
+                    var_list = rv64.get("variables", [])
+            elif not var_list and "RV32" in encoding:
+                rv32 = encoding.get("RV32", {})
+                if isinstance(rv32, dict):
+                    var_list = rv32.get("variables", [])
+            for var in var_list:
+                if isinstance(var, dict):
+                    name = var.get("name"); location = var.get("location"); not_c = var.get("not")
+                    if name and location:
+                        variables[name] = {'location': str(location), 'not': not_c}
+        return variables
+
+    def _parse_assembly(self, assembly_str):
+        comps = [c.strip() for c in assembly_str.split(',')]
+        parsed = []
+        for comp in comps:
+            if '(' in comp and ')' in comp:
+                m = re.match(r'([^(]+)\(([^)]+)\)', comp)
+                if m:
+                    offset, base = m.groups(); parsed.extend([offset.strip(), base.strip()])
+                else:
+                    parsed.append(comp)
+            else:
+                parsed.append(comp)
+        return parsed
+
+    def _map_single_operand(self, operand, variables, instr_info):
+        operand = operand.strip()
+        if operand in ("rm", "csr"):
+            return ""
+        if operand in variables:
+            var = variables[operand]; location = var['location']; not_c = var.get('not')
+            is_compressed = instr_info.get("name", "").startswith("c.")
+            mapping_keys = [ (operand, location), (operand, location, 'compressed') if is_compressed else None, (operand, location, 'x8-x15') if not_c == 0 else None ]
+            for key in mapping_keys:
+                if key and key in self.variable_map:
+                    res = self.variable_map[key]
+                    if res:
+                        return res
+            if self.operand_matcher:
+                suggestion = self.operand_matcher.suggest_operand_mapping(operand, location)
+                if suggestion:
+                    return suggestion
+            if self.operand_matcher:
+                return self.operand_matcher.get_fallback_operand(operand, location)
+            else:
+                return f"NON_DEFINED_{operand}_{location}"
+        return f"NON_DEFINED_{operand}"
+
+
+def load_full_instructions(inst_dir, enabled_extensions, include_all, target_arch):
+    instructions = {}
+    if enabled_extensions is None:
+        enabled_extensions = []
+    yaml_files = glob.glob(os.path.join(inst_dir, "**/*.yaml"), recursive=True)
+    logging.info(f"Found {len(yaml_files)} instruction files in {inst_dir}")
+    for yaml_file in yaml_files:
+        try:
+            with open(yaml_file, 'r', encoding='utf-8') as f:
+                data = yaml.safe_load(f)
+            if not isinstance(data, dict) or data.get('kind') != 'instruction':
+                continue
+            name = data.get('name')
+            if not name:
+                continue
+            defined_by = data.get('definedBy')
+            if not include_all and defined_by:
+                try:
+                    meets_req = parse_extension_requirements(defined_by)
+                    if not meets_req(enabled_extensions):
+                        logging.debug(f"Skipping {name} - extension requirements not met")
+                        continue
+                except Exception as e:
+                    logging.debug(f"Error parsing extension requirements for {name}: {e}")
+                    continue
+            encoding = data.get('encoding', {})
+            if target_arch in ['RV32', 'RV64'] and target_arch in encoding:
+                arch_encoding = encoding[target_arch]
+                data['encoding'] = arch_encoding
+            instructions[name] = data
+        except Exception as e:
+            logging.error(f"Error loading {yaml_file}: {e}")
+            continue
+    logging.info(f"Loaded {len(instructions)} instructions after filtering")
+    return instructions
+
+
+def generate_binutils_opcodes(instr_dict, output_file="riscv-opc.c", extension_mapper=None, binutils_path=None):
+    operand_mapper = OperandMapper(binutils_path)
+    if extension_mapper is None:
+        extension_mapper = ExtensionMapper()
+    args = " ".join(sys.argv)
+
+    opcode_entries = []
+    stats = {'total': 0, 'success': 0, 'non_defined_operands': 0, 'non_defined_extensions': 0, 'errors': 0}
+
+    for name, info in sorted(instr_dict.items(), key=lambda x: x[0].upper()):
+        stats['total'] += 1
+        try:
+            encoding = info.get("encoding", {})
+            match_str = encoding.get("match", "")
+            if not match_str:
+                logging.warning(f"No match string found for {name}")
+                continue
+            defined_by = info.get("definedBy", "I")
+            assembly = info.get("assembly", "")
+            enc_match = parse_match(match_str)
+            enc_mask = int(''.join('1' if c != '-' else '0' for c in match_str), 2)
+            insn_class = extension_mapper.map_extension(defined_by, instruction_name=name)
+            if "NON_DEFINED" in insn_class:
+                stats['non_defined_extensions'] += 1
+                logging.warning(f"Non-defined extension for {name}: {defined_by} -> {insn_class}")
+            if insn_class.startswith('INSN_CLASS_') and not is_user_defined_class(insn_class):
+                if not hasattr(generate_binutils_opcodes, 'custom_classes'):
+                    generate_binutils_opcodes.custom_classes = set()
+                generate_binutils_opcodes.custom_classes.add(insn_class)
+            operand_format = operand_mapper.map_assembly(assembly, info)
+            if "NON_DEFINED" in operand_format:
+                stats['non_defined_operands'] += 1
+                logging.warning(f"Non-defined operands for {name}: {assembly} -> {operand_format}")
+            match_const = f"MATCH_{name.upper().replace('.', '_')}"
+            mask_const = f"MASK_{name.upper().replace('.', '_')}"
+            entry = f'  {{"{name}", 0, {insn_class}, "{operand_format}", {match_const}, {mask_const}, match_opcode, 0}}'
+            opcode_entries.append(entry)
+            if not hasattr(generate_binutils_opcodes, 'constants'):
+                generate_binutils_opcodes.constants = []
+            generate_binutils_opcodes.constants.append({
+                'name': name,
+                'match_const': match_const,
+                'mask_const': mask_const,
+                'match_value': f"0x{enc_match:x}",
+                'mask_value': f"0x{enc_mask:x}"
+            })
+            if insn_class.startswith('INSN_CLASS_') and not is_user_defined_class(insn_class):
+                if not hasattr(generate_binutils_opcodes, 'custom_class_extensions'):
+                    generate_binutils_opcodes.custom_class_extensions = {}
+                generate_binutils_opcodes.custom_class_extensions[insn_class] = defined_by
+            stats['success'] += 1
+        except Exception as e:
+            stats['errors'] += 1
+            logging.error(f"Error processing {name}: {e}")
+            continue
+
+    prelude = f"/* Code generated by {args}; DO NOT EDIT. */\n"
+    prelude += "/* This file should be placed at: binutils-gdb/opcodes/riscv-opc.c */\n\n"
+    prelude += """#include "opcode/riscv.h"
+
+const struct riscv_opcode riscv_opcodes[] = {
+"""
+    opcodes_str = ",\n".join(opcode_entries)
+    postlude = "\n};\n"
+    full_output = prelude + opcodes_str + postlude
+    with open(output_file, "w", encoding="utf-8") as f:
+        f.write(full_output)
+
+    generate_header_file(output_file, args)
+    generate_subset_support(output_file, args)
+
+    header_file = output_file.replace('.c', '.h')
+    support_file = 'elfxx-riscv.c'
+    logging.info(f"Generated files:")
+    logging.info(f"  Opcode table:    {output_file}")
+    logging.info(f"  Header file:     {header_file}")
+    if hasattr(generate_binutils_opcodes, 'custom_class_extensions') and generate_binutils_opcodes.custom_class_extensions:
+        logging.info(f"  Subset support:  {support_file}")
+    logging.info(f"Statistics:")
+    logging.info(f"  Total instructions: {stats['total']}")
+    logging.info(f"  Successfully processed: {stats['success']}")
+    logging.info(f"  Non-defined operands: {stats['non_defined_operands']}")
+    logging.info(f"  Non-defined extensions: {stats['non_defined_extensions']}")
+    logging.info(f"  Errors: {stats['errors']}")
+
+
+def generate_header_file(output_file, args):
+    if not hasattr(generate_binutils_opcodes, 'constants'):
+        logging.warning("No constants collected for header generation")
+        return
+    header_file = output_file.replace('.c', '.h')
+    args_str = " ".join(sys.argv)
+    header_content = f"""/* Code generated by {args_str}; DO NOT EDIT. */
+/* This file should be placed at: binutils-gdb/include/opcode/riscv.h (append to existing file) */
+
+#ifndef RISCV_OPC_H
+#define RISCV_OPC_H
+
+/* RISC-V opcode constants for {len(generate_binutils_opcodes.constants)} instructions */
+
+"""
+    custom_classes = set()
+    if hasattr(generate_binutils_opcodes, 'custom_classes'):
+        custom_classes = generate_binutils_opcodes.custom_classes
+    if custom_classes:
+        header_content += "/* Custom instruction class definitions */\n"
+        header_content += "/* Add these to your binutils enum riscv_insn_class in include/opcode/riscv.h */\n"
+        for class_name in sorted(custom_classes):
+            header_content += f"/* {class_name}, */\n"
+        header_content += "\n"
+    header_content += "/* MATCH constants */\n"
+    for const in sorted(generate_binutils_opcodes.constants, key=lambda x: x['name']):
+        header_content += f"#define {const['match_const']} {const['match_value']}\n"
+    header_content += "\n/* MASK constants */\n"
+    for const in sorted(generate_binutils_opcodes.constants, key=lambda x: x['name']):
+        header_content += f"#define {const['mask_const']} {const['mask_value']}\n"
+    header_content += "\n#endif /* RISCV_OPC_H */\n"
+    with open(header_file, "w", encoding="utf-8") as f:
+        f.write(header_content)
+    stats_msg = f"  MATCH/MASK constants: {len(generate_binutils_opcodes.constants) * 2}"
+    if custom_classes:
+        stats_msg += f", Custom classes: {len(custom_classes)}"
+    logging.info(f"Generated header file: {header_file}")
+    logging.info(stats_msg)
+
+
+def generate_subset_support(output_file, args):
+    if not hasattr(generate_binutils_opcodes, 'custom_class_extensions'):
+        return
+    custom_class_extensions = generate_binutils_opcodes.custom_class_extensions
+    if not custom_class_extensions:
+        return
+    support_file = 'elfxx-riscv.c'
+    args_str = " ".join(sys.argv)
+    content = f"""/* Code generated by {args_str}; DO NOT EDIT. */
+/* This file contains code snippets that should be added to: binutils-gdb/bfd/elfxx-riscv.c */
+
+/* Add these cases to riscv_multi_subset_supports() in bfd/elfxx-riscv.c */
+
+"""
+    content += "/* Cases for riscv_multi_subset_supports() switch statement */\n"
+    for class_name in sorted(custom_class_extensions.keys()):
+        extension_def = custom_class_extensions[class_name]
+        case_code = generate_subset_support_case_from_udb(class_name, extension_def)
+        content += case_code
+    content += "\n/* Cases for riscv_multi_subset_supports_ext() switch statement */\n"
+    for class_name in sorted(custom_class_extensions.keys()):
+        extension_def = custom_class_extensions[class_name]
+        case_code = generate_subset_support_ext_case_from_udb(class_name, extension_def)
+        content += case_code
+    content += f"""
+
+/* Instructions for integration:
+ * 
+ * 1. Add the instruction classes to include/opcode/riscv.h:
+ *    (Already listed in the generated .h file)
+ *
+ * 2. Add these cases to the switch statement in bfd/elfxx-riscv.c:
+ *    - Find function riscv_multi_subset_supports()
+ *    - Add the cases above to the switch statement
+ *    - Find function riscv_multi_subset_supports_ext()  
+ *    - Add the ext cases above to the switch statement
+ *
+ * 3. Extension names are converted to lowercase in subset support functions
+ */
+"""
+    with open(support_file, "w", encoding="utf-8") as f:
+        f.write(content)
+    logging.info(f"Generated subset support file: {support_file}")
+    logging.info(f"  Custom classes: {len(custom_class_extensions)}")
+
+
+def generate_subset_support_case_from_udb(class_name, extension_def):
+    logic = generate_extension_logic(extension_def)
+    return f"""    case {class_name}:
+      {logic}
+"""
+
+
+def generate_subset_support_ext_case_from_udb(class_name, extension_def):
+    ext_message = generate_extension_error_message(extension_def)
+    if isinstance(ext_message, str) and ext_message.startswith('_('):
+        return f"""    case {class_name}:
+      return {ext_message};
+"""
+    else:
+        return f"""    case {class_name}:
+      return \"{ext_message}\";
+"""
+
+
+def generate_extension_logic(extension_def):
+    if isinstance(extension_def, str):
+        return f'return riscv_subset_supports (rps, "{extension_def.lower()}");'
+    elif isinstance(extension_def, dict):
+        if "anyOf" in extension_def:
+            extensions = extension_def["anyOf"]
+            checks = [f'riscv_subset_supports (rps, "{ext.lower()}")' for ext in extensions]
+            return f"return ({' || '.join(checks)});"
+        elif "allOf" in extension_def:
+            extensions = extension_def["allOf"]
+            checks = [f'riscv_subset_supports (rps, "{ext.lower()}")' for ext in extensions]
+            return f"return ({' && '.join(checks)});"
+        else:
+            return 'return false; /* TODO: Complex extension logic */'
+    else:
+        return 'return false; /* TODO: Unknown extension type */'
+
+
+def generate_extension_error_message(extension_def):
+    if isinstance(extension_def, str):
+        return extension_def.lower()
+    elif isinstance(extension_def, dict):
+        if "anyOf" in extension_def:
+            extensions = [ext.lower() for ext in extension_def["anyOf"]]
+            ext_list = "' or `".join(extensions)
+            return f'_("{ext_list}")'
+        elif "allOf" in extension_def:
+            extensions = [ext.lower() for ext in extension_def["allOf"]]
+            ext_list = "' and `".join(extensions)
+            return f'_("{ext_list}")'
+        else:
+            return f'TODO: {extension_def}'
+    else:
+        return f'TODO: {extension_def}'
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description="Generate binutils RISC-V opcode table from UDB instruction definitions"
+    )
+    parser.add_argument("--inst-dir", default="../../../spec/std/isa/inst/", help="Directory containing instruction YAML files")
+    parser.add_argument("--output", default="riscv-opc.c", help="Output C file name (corresponding .h file will be generated automatically)")
+    parser.add_argument("--extensions", default="I,M,A,F,D,C,Zba,Zbb,Zbs,Zca", help="Comma-separated list of enabled extensions")
+    parser.add_argument("--arch", default="RV64", choices=["RV32", "RV64", "BOTH"], help="Target architecture")
+    parser.add_argument("--verbose", "-v", action="store_true", help="Enable verbose logging")
+    parser.add_argument("--include-all", "-a", action="store_true", help="Include all instructions, ignoring extension filtering")
+    parser.add_argument("--binutils-path", default="../binutils-gdb/", help="Path to binutils-gdb source directory for operand reference")
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+    include_all = args.include_all or not args.extensions
+    if include_all:
+        enabled_extensions = []
+        logging.info("Including all instructions (extension filtering disabled)")
+    else:
+        enabled_extensions = [ext.strip() for ext in args.extensions.split(",") if ext.strip()]
+        logging.info(f"Enabled extensions: {', '.join(enabled_extensions)}")
+    logging.info(f"Target architecture: {args.arch}")
+    extension_mapper = ExtensionMapper()
+    logging.info("Using user-defined names from insn_class_config.py or auto-generated defaults")
+    if not os.path.isdir(args.inst_dir):
+        logging.error(f"Instruction directory not found: {args.inst_dir}")
+        sys.exit(1)
+    instr_dict = load_full_instructions(args.inst_dir, enabled_extensions, include_all, args.arch)
+    if not instr_dict:
+        logging.error("No instructions found or all were filtered out.")
+        sys.exit(1)
+    logging.info(f"Loaded {len(instr_dict)} instructions")
+    generate_binutils_opcodes(instr_dict, args.output, extension_mapper, args.binutils_path)
+    warning = extension_mapper.get_defaults_warning()
+    if warning:
+        print(warning)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/backends/generators/binutils/binutils_parser.py b/backends/generators/binutils/binutils_parser.py
new file mode 100644
index 0000000000..f2187e9ffb
--- /dev/null
+++ b/backends/generators/binutils/binutils_parser.py
@@ -0,0 +1,436 @@
+"""
+Binutils Source Parser for RISC-V Operand Definitions
+
+Houses both:
+- A small API (BinutilsParser) used by the generator to discover operand tokens
+  and their bit positions; and
+- The underlying extractor helpers (parse_op_fields, parse_encode_macros,
+  extract_operand_mapping, derive_bits_for_token) so logic lives in one place.
+
+This keeps behavior identical while reducing duplication, and allows other
+tools (like the standalone extractor script) to import the same helpers.
+"""
+
+import os
+import logging
+import re
+from pathlib import Path
+from collections import OrderedDict
+from typing import Dict, List, Tuple, Optional, NamedTuple
+
+
+# ----------------------------
+# Extractor helper functions
+# ----------------------------
+
+def parse_op_fields(riscv_h: str):
+    """Parse OP_SH_* and OP_MASK_* into a field->bits map.
+
+    Returns dict FIELD -> {shift, mask, width, bits:[int...]}
+    """
+    sh_re = re.compile(r"#define\s+OP_SH_([A-Z0-9_]+)\s+(\d+)")
+    mask_re = re.compile(r"#define\s+OP_MASK_([A-Z0-9_]+)\s+((?:0x[0-9A-Fa-f]+|\d+)[Uu]?)")
+
+    shifts = {}
+    masks = {}
+    for m in sh_re.finditer(riscv_h):
+        shifts[m.group(1)] = int(m.group(2))
+    for m in mask_re.finditer(riscv_h):
+        raw = m.group(2)
+        if raw.endswith(('U', 'u')):
+            raw = raw[:-1]
+        masks[m.group(1)] = int(raw, 0)
+
+    fields = {}
+    for name, sh in shifts.items():
+        if name not in masks:
+            continue
+        mask = masks[name]
+        width = mask.bit_count()
+        bits = []
+        local = mask
+        bit_index = 0
+        while local:
+            if local & 1:
+                bits.append(sh + bit_index)
+            local >>= 1
+            bit_index += 1
+        if width and len(bits) != width:
+            bits = sorted(bits)
+        elif width:
+            bits = list(range(sh, sh + width))
+        fields[name] = {
+            "shift": sh,
+            "mask": mask,
+            "width": width,
+            "bits": bits,
+        }
+    return fields
+
+
+def parse_encode_macros(riscv_h: str):
+    """Parse ENCODE_* macros to compute instruction bit destinations for immediates."""
+    define_re = re.compile(r"^#define\s+ENCODE_([A-Z0-9_]+)\(x\)\s+(.*)$", re.M)
+    lines = riscv_h.splitlines()
+    macros = {}
+    for m in define_re.finditer(riscv_h):
+        name = m.group(1)
+        start_pos = m.start()
+        start_line = riscv_h.count('\n', 0, start_pos)
+        body_lines = []
+        i = start_line
+        while i < len(lines):
+            body_lines.append(lines[i])
+            if not lines[i].rstrip().endswith('\\'):
+                break
+            i += 1
+        body = " ".join([bl.rstrip(' \\') for bl in body_lines])
+        segs = []
+        for sm in re.finditer(r"RV_X\(x,\s*(\d+),\s*(\d+)\)\s*<<\s*(\d+)", body):
+            src_start = int(sm.group(1))
+            width = int(sm.group(2))
+            dst_start = int(sm.group(3))
+            segs.append({"src_start": src_start, "width": width, "dst_start": dst_start})
+        if not segs:
+            continue
+        bits = []
+        for seg in segs:
+            bits.extend(range(seg["dst_start"], seg["dst_start"] + seg["width"]))
+        macros[name] = {"segments": segs, "bits": sorted(set(bits))}
+    return macros
+
+
+case_re = re.compile(r"^\s*case\s*'(.?)':\s*(?:/\*\s*(.*?)\s*\*/)?")
+switch_start_re = re.compile(r"^\s*switch\s*\(\*[+]*oparg\)\s*")
+INSERT_RE = re.compile(r"INSERT_OPERAND\s*\(\s*([A-Z0-9_]+)")
+EXTRACT_ANY_RE = re.compile(r"\bEXTRACT_([A-Z0-9_]+)\s*\(")
+ENCODE_ANY_RE = re.compile(r"\bENCODE_([A-Z0-9_]+)\s*\(")
+
+
+def parse_operand_switch(lines, start_idx=0):
+    """Parse the switch(*oparg) table capturing top-level cases and nested C/V/X/W."""
+    entries = []
+    i = start_idx
+    n = len(lines)
+    while i < n and not switch_start_re.search(lines[i]):
+        i += 1
+    if i >= n:
+        return entries
+    i += 1
+    while i < n:
+        line = lines[i]
+        m = case_re.match(line)
+        if m:
+            ch, cmt = m.group(1), (m.group(2) or '').strip()
+            if ch in ('C', 'V', 'X', 'W'):
+                j = i + 1
+                while j < n and not switch_start_re.search(lines[j]):
+                    if case_re.match(lines[j]):
+                        break
+                    j += 1
+                if j >= n or not switch_start_re.search(lines[j]):
+                    i += 1
+                    continue
+                depth = 0
+                seen_brace = False
+                k = j + 1
+                while k < n:
+                    l2 = lines[k]
+                    if '{' in l2 or '}' in l2:
+                        depth += l2.count('{') - l2.count('}')
+                        if l2.count('{'):
+                            seen_brace = True
+                        if seen_brace and depth < 0:
+                            break
+                        if seen_brace and depth == 0:
+                            k += 1
+                            break
+                    mm = case_re.match(l2)
+                    if mm and (not seen_brace or depth > 0):
+                        subch, subcmt = mm.group(1), (mm.group(2) or '').strip()
+                        key = f"{ch}.{subch}"
+                        inserts, extracts, encodes = set(), set(), set()
+                        la = 0
+                        kk = k + 1
+                        local_depth = 0
+                        while kk < n and la < 30:
+                            if case_re.match(lines[kk]) and local_depth == 0:
+                                break
+                            local_depth += lines[kk].count('{') - lines[kk].count('}')
+                            inserts.update(INSERT_RE.findall(lines[kk]))
+                            extracts.update(EXTRACT_ANY_RE.findall(lines[kk]))
+                            encodes.update(ENCODE_ANY_RE.findall(lines[kk]))
+                            kk += 1
+                            la += 1
+                        entries.append((key, subcmt, sorted(inserts), sorted(extracts), sorted(encodes)))
+                    k += 1
+                i = k
+                continue
+            else:
+                key = ch
+                inserts, extracts, encodes = set(), set(), set()
+                la = 0
+                j = i + 1
+                local_depth = 0
+                while j < n and la < 30:
+                    if case_re.match(lines[j]) and local_depth == 0:
+                        break
+                    local_depth += lines[j].count('{') - lines[j].count('}')
+                    inserts.update(INSERT_RE.findall(lines[j]))
+                    extracts.update(EXTRACT_ANY_RE.findall(lines[j]))
+                    encodes.update(ENCODE_ANY_RE.findall(lines[j]))
+                    j += 1
+                    la += 1
+                entries.append((key, cmt, sorted(inserts), sorted(extracts), sorted(encodes)))
+        i += 1
+    return entries
+
+
+def extract_operand_mapping(tc_riscv_c: str, riscv_dis_c: str):
+    """Return an ordered mapping of operand token -> macro usage from asm+dis."""
+    asm_lines = tc_riscv_c.splitlines()
+    dis_lines = riscv_dis_c.splitlines()
+    def start_idx(lines):
+        for idx, ln in enumerate(lines):
+            if 'The operand string defined in the riscv_opcodes' in ln:
+                return idx
+        for idx, ln in enumerate(lines):
+            if 'switch (*oparg)' in ln:
+                return idx
+        return 0
+    asm_entries = parse_operand_switch(asm_lines, start_idx(asm_lines))
+    dis_entries = parse_operand_switch(dis_lines, start_idx(dis_lines))
+
+    merged = OrderedDict()
+    for key, cmt, ins, ex, en in asm_entries:
+        merged[key] = {
+            'asm': {'comment': cmt, 'inserts': ins, 'encodes': en},
+            'dis': {'comment': '', 'extracts': []},
+        }
+    for key, cmt, ins, ex, en in dis_entries:
+        d = merged.setdefault(key, {'asm': {'comment': '', 'inserts': [], 'encodes': []},
+                                    'dis': {'comment': '', 'extracts': []}})
+        d['dis']['comment'] = cmt
+        d['dis']['extracts'] = ex
+    return merged
+
+
+def derive_bits_for_token(token, macro_use, fields_map, enc_map):
+    """Given a token's asm/dis macro usage, compute bit positions and notes."""
+    bits = set()
+    notes = []
+    inserts = macro_use.get('asm', {}).get('inserts', [])
+    encodes = macro_use.get('asm', {}).get('encodes', [])
+    extracts = macro_use.get('dis', {}).get('extracts', [])
+
+    for fld in inserts:
+        if fld in fields_map:
+            bits.update(fields_map[fld]['bits'])
+
+    for enc in encodes:
+        if enc in enc_map:
+            bits.update(enc_map[enc]['bits'])
+
+    if not bits and extracts:
+        for ex in extracts:
+            if ex in fields_map:
+                bits.update(fields_map[ex]['bits'])
+                continue
+            alias = None
+            if ex.endswith('_IMM'):
+                alias = ex.replace('EXTRACT_', 'ENCODE_')
+            elif ex.startswith('RVV_V') or ex.startswith('ZCB') or ex.startswith('ZCM') or ex.startswith('CV_') or ex.startswith('MIPS_'):
+                alias = ex.replace('EXTRACT_', 'ENCODE_')
+            if alias and alias in enc_map:
+                bits.update(enc_map[alias]['bits'])
+
+    if not bits:
+        fallback = {
+            'd': 'RD', 's': 'RS1', 't': 'RS2', 'r': 'RS3',
+            'm': 'RM', 'E': 'CSR', 'P': 'PRED', 'Q': 'SUCC',
+            '>': 'SHAMT', '<': 'SHAMTW', 'Z': 'RS1',
+            'C.s': 'CRS1S', 'C.t': 'CRS2S', 'C.V': 'CRS2',
+            'V.d': 'VD', 'V.s': 'VS1', 'V.t': 'VS2', 'V.m': 'VMASK', 'V.i': 'VIMM', 'V.j': 'VIMM',
+        }
+        fld = fallback.get(token)
+        if fld and fld in fields_map:
+            bits.update(fields_map[fld]['bits'])
+
+    if token == '0':
+        notes.append('constant-zero; bits reported when context provides an immediate encoder')
+
+    return sorted(bits), notes
+
+# Reuse the proven extractor implementation in this directory.
+# We import its helpers instead of re-implementing parsing here.
+from extract_riscv_operand_bits import (
+    parse_op_fields,
+    parse_encode_macros,
+    extract_operand_mapping,
+    derive_bits_for_token,
+)
+
+
+class OperandInfo(NamedTuple):
+    """Information about a binutils operand character."""
+    char: str
+    bit_start: int
+    bit_end: int
+    operand_type: str  # 'register', 'immediate', 'address', 'special'
+    semantic_role: str  # 'destination', 'source1', 'source2', 'immediate', etc.
+    description: str
+    constraints: str  # Any special constraints or notes
+
+
+class BinutilsParser:
+    """Parses binutils source files to extract RISC-V operand definitions using binutils' own logic."""
+    
+    def __init__(self, binutils_path: str):
+        self.binutils_path = binutils_path
+        self.operand_info: Dict[str, OperandInfo] = {}
+        self.parsed = False
+        # Keep only operand_info; generator/Matcher don't need raw bit lists here
+        
+    def validate_binutils_path(self) -> bool:
+        """Check if binutils path exists and contains required files."""
+        if not os.path.isdir(self.binutils_path):
+            return False
+            
+        required_files = [
+            "gas/config/tc-riscv.c",
+            "opcodes/riscv-dis.c", 
+            "include/opcode/riscv.h"
+        ]
+        
+        for file_path in required_files:
+            full_path = os.path.join(self.binutils_path, file_path)
+            if not os.path.isfile(full_path):
+                logging.warning(f"Required binutils file not found: {full_path}")
+                return False
+                
+        return True
+
+    def read_file(self, path: str) -> str:
+        """Read file with proper encoding handling."""
+        full_path = os.path.join(self.binutils_path, path)
+        return Path(full_path).read_text(encoding="utf-8", errors="ignore")
+
+    # All parsing helpers are imported from extract_riscv_operand_bits
+
+    def parse_operand_definitions(self) -> bool:
+        """Parse binutils source files to extract operand definitions using binutils' own logic."""
+        if not self.validate_binutils_path():
+            logging.error(f"Invalid binutils path: {self.binutils_path}")
+            return False
+            
+        try:
+            # Read source files
+            riscv_h = self.read_file("include/opcode/riscv.h")
+            tc_riscv_c = self.read_file("gas/config/tc-riscv.c")
+            riscv_dis_c = self.read_file("opcodes/riscv-dis.c")
+
+            # Parse using the shared extractor helpers
+            fields_map = parse_op_fields(riscv_h)
+            enc_map = parse_encode_macros(riscv_h)
+            op_token_map = extract_operand_mapping(tc_riscv_c, riscv_dis_c)
+
+            # Convert to our operand info format
+            for token, macro_use in op_token_map.items():
+                bits, _notes = derive_bits_for_token(token, macro_use, fields_map, enc_map)
+                if bits:
+                    bit_start, bit_end = min(bits), max(bits)
+                else:
+                    bit_start, bit_end = -1, -1
+                
+                # Infer operand type and semantic role
+                operand_type = self._infer_operand_type_from_token(token, macro_use)
+                semantic_role = self._infer_semantic_role_from_token(token, macro_use)
+                
+                self.operand_info[token] = OperandInfo(
+                    char=token,
+                    bit_start=bit_start,
+                    bit_end=bit_end,
+                    operand_type=operand_type,
+                    semantic_role=semantic_role,
+                    description=f"Operand character '{token}'",
+                    constraints=""
+                )
+            
+            self.parsed = True
+            logging.info(f"Parsed {len(self.operand_info)} operand definitions from binutils using superior parsing")
+            
+            # Debug: show what operands we found
+            if logging.getLogger().isEnabledFor(logging.DEBUG):
+                logging.debug("Found operand definitions:")
+                for char, info in self.operand_info.items():
+                    logging.debug(f"  '{char}': bits {info.bit_start}-{info.bit_end}, type={info.operand_type}, role={info.semantic_role}")
+            
+            return True
+        except Exception as e:
+            logging.error(f"Error parsing binutils source: {e}")
+            return False
+
+    def _infer_operand_type_from_token(self, token: str, macro_use: dict) -> str:
+        """Infer operand type from token name and macro usage."""
+        if token.startswith('V.') or 'VD' in str(macro_use) or 'VS' in str(macro_use):
+            return 'vector'
+        elif token.startswith('C.'):
+            return 'compressed'
+        elif token in ['d', 's', 't', 'r', 'D', 'S', 'T', 'R']:
+            return 'register'
+        elif token in ['j', 'i', 'o', 'u', 'a', 'p', 'q'] or 'IMM' in str(macro_use):
+            return 'immediate'
+        elif token in ['>', '<']:
+            return 'shift'
+        elif token in ['P', 'Q', 'p', 'q'] and 'PRED' in str(macro_use) or 'SUCC' in str(macro_use):
+            return 'fence'
+        elif token in ['E', 'm']:
+            return 'special'
+        else:
+            return 'unknown'
+
+    def _infer_semantic_role_from_token(self, token: str, macro_use: dict) -> str:
+        """Infer semantic role from token name and macro usage."""
+        if token in ['d', 'D', 'V.d']:
+            return 'destination'
+        elif token in ['s', 'S', 'V.s']:
+            return 'source1'
+        elif token in ['t', 'T', 'V.t']:
+            return 'source2'
+        elif token in ['r', 'R']:
+            return 'source3'
+        elif token in ['j', 'i', 'o', 'u', 'a', 'p', 'q', '>', '<']:
+            return 'immediate'
+        elif token in ['P', 'Q']:
+            return 'fence_pred_succ'
+        elif token == 'E':
+            return 'csr'
+        elif token == 'm':
+            return 'rounding_mode'
+        else:
+            return 'unknown'
+    
+    # Interface methods for compatibility
+    def get_operand_info(self, char: str) -> Optional[OperandInfo]:
+        """Get information about a specific operand character."""
+        return self.operand_info.get(char)
+    
+    def get_all_operands(self) -> Dict[str, OperandInfo]:
+        """Get all parsed operand information."""
+        return self.operand_info.copy()
+    
+    def find_matching_operands(self, bit_start: int, bit_end: int, 
+                             operand_type: str = None) -> List[OperandInfo]:
+        """Find operand characters that match given bit positions and type."""
+        matches = []
+        
+        for info in self.operand_info.values():
+            # Check bit position overlap
+            if (info.bit_start <= bit_end and info.bit_end >= bit_start):
+                # Check type compatibility if specified
+                if operand_type is None or info.operand_type == operand_type:
+                    matches.append(info)
+        
+        # Sort by how well the bit positions match
+        matches.sort(key=lambda x: abs((x.bit_start + x.bit_end) - (bit_start + bit_end)))
+        return matches
diff --git a/backends/generators/binutils/extract_riscv_operand_bits.py b/backends/generators/binutils/extract_riscv_operand_bits.py
new file mode 100644
index 0000000000..e8965196ab
--- /dev/null
+++ b/backends/generators/binutils/extract_riscv_operand_bits.py
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+"""
+Extract RISC-V operand tokens and bit positions (JSON/Markdown).
+
+Thin wrapper that reuses helpers in binutils_parser.py (single source of truth).
+"""
+
+import json
+import argparse
+import sys
+from pathlib import Path
+from collections import OrderedDict
+
+from binutils_parser import (
+    parse_op_fields,
+    parse_encode_macros,
+    extract_operand_mapping,
+    derive_bits_for_token,
+)
+
+ROOT = (Path(__file__).resolve().parents[1] / "binutils-gdb").resolve()
+RISCV_H = ROOT / "include" / "opcode" / "riscv.h"
+ASM = ROOT / "gas" / "config" / "tc-riscv.c"
+DIS = ROOT / "opcodes" / "riscv-dis.c"
+
+
+def read(path: Path) -> str:
+    return path.read_text(encoding="utf-8", errors="ignore")
+
+
+def _bit_ranges(bits):
+    if not bits:
+        return []
+    bits = sorted(bits)
+    ranges = []
+    start = prev = bits[0]
+    for b in bits[1:]:
+        if b == prev + 1:
+            prev = b
+            continue
+        ranges.append((start, prev))
+        start = prev = b
+    ranges.append((start, prev))
+    return [f"{a}" if a == b else f"{a}..{b}" for a, b in ranges]
+
+
+def _emit_markdown(out_obj, fp):
+    tokens = out_obj.get('tokens', {})
+    fp.write("RISC-V Operand Bit Positions (from binutils)\n")
+    fp.write("\n")
+    fp.write("- Bit indices are instruction bit positions with LSB = 0.\n")
+    fp.write("- Fields come from OP_MASK_*/OP_SH_*; immediates from ENCODE_* macros.\n")
+    fp.write("\n")
+
+    def grp(tok):
+        if tok.startswith('C.'):
+            return (1, tok)
+        if tok.startswith('V.'):
+            return (2, tok)
+        if tok.startswith('X.') or tok.startswith('W.'):
+            return (3, tok)
+        return (0, tok)
+
+    for tok in sorted(tokens.keys(), key=grp):
+        data = tokens[tok]
+        bits = data.get('bits', [])
+        ranges = _bit_ranges(bits)
+        fields = data.get('asm_inserts', [])
+        encs = data.get('asm_encodes', [])
+        exts = data.get('dis_extracts', [])
+        fp.write(f"- {tok}\n")
+        fp.write(f"  - bits: {', '.join(ranges) if ranges else '(none)'}\n")
+        if fields:
+            fp.write(f"  - fields: {', '.join(fields)}\n")
+        if encs:
+            fp.write(f"  - encodes: {', '.join(encs)}\n")
+        if exts:
+            fp.write(f"  - extracts: {', '.join(exts)}\n")
+        notes = data.get('notes') or []
+        if notes:
+            fp.write(f"  - notes: {'; '.join(notes)}\n")
+        fp.write("\n")
+
+
+def main():
+    ap = argparse.ArgumentParser(description="Extract RISC-V operand bit positions from binutils sources")
+    ap.add_argument('--format', '-f', choices=['json', 'markdown', 'md', 'text'], default='json',
+                    help='Output format (default: json)')
+    ap.add_argument('--out', '-o', default='-', help='Output file path or - for stdout')
+    args = ap.parse_args()
+    if not (RISCV_H.exists() and ASM.exists() and DIS.exists()):
+        print("error: missing binutils sources next to this script", file=sys.stderr)
+        sys.exit(1)
+
+    riscv_h = read(RISCV_H)
+    tc_riscv_c = read(ASM)
+    riscv_dis_c = read(DIS)
+
+    fields_map = parse_op_fields(riscv_h)
+    enc_map = parse_encode_macros(riscv_h)
+    op_token_map = extract_operand_mapping(tc_riscv_c, riscv_dis_c)
+
+    results = OrderedDict()
+    for token, macro_use in op_token_map.items():
+        bits, notes = derive_bits_for_token(token, macro_use, fields_map, enc_map)
+        results[token] = {
+            'bits': bits,
+            'asm_inserts': macro_use.get('asm', {}).get('inserts', []),
+            'asm_encodes': macro_use.get('asm', {}).get('encodes', []),
+            'dis_extracts': macro_use.get('dis', {}).get('extracts', []),
+            'notes': notes,
+        }
+
+    # Enrich with a simple dictionary of OP fields and ENCODE immediates for reference
+    ref = {
+        'op_fields': {k: {'bits': v['bits'], 'shift': v['shift'], 'mask': v['mask'], 'width': v['width']}
+                       for k, v in sorted(fields_map.items())},
+        'encode_immediates': {k: {'bits': v['bits'], 'segments': v['segments']}
+                              for k, v in sorted(enc_map.items())},
+    }
+
+    out = {
+        'tokens': results,
+        'reference': ref,
+    }
+
+    # Emit in the requested format
+    out_path = args.out
+    fmt = args.format
+    if out_path == '-':
+        fp = sys.stdout
+        close_fp = False
+    else:
+        fp = open(out_path, 'w', encoding='utf-8')
+        close_fp = True
+
+    try:
+        if fmt in ('markdown', 'md', 'text'):
+            _emit_markdown(out, fp)
+        else:
+            json.dump(out, fp, indent=2, sort_keys=False)
+            fp.write('\n')
+    finally:
+        if close_fp:
+            fp.close()
+
+
+if __name__ == '__main__':
+    main()
diff --git a/backends/generators/binutils/gas_test_generator.py b/backends/generators/binutils/gas_test_generator.py
new file mode 100644
index 0000000000..5c21cdf35b
--- /dev/null
+++ b/backends/generators/binutils/gas_test_generator.py
@@ -0,0 +1,1634 @@
+"""
+GNU Assembler Test Generator for RISC-V
+
+Generates GNU Assembler test files (.s, .d, .l) from RISC-V unified database.
+
+Generated Test Files:
+- Assembly source files (.s) containing assembly instructions
+- Dump files (.d) containing expected disassembly patterns
+- Error files (.l) for negative tests
+- Fail test sets (-fail.s, -fail.d, -fail.l)
+- Architecture-specific tests (currently only rv64)
+The generator automatically discovers extension patterns from the unified database
+and generates tests that should integrate seamlessly with the existing gas test suite.
+"""
+
+import os
+import sys
+import argparse
+import logging
+import yaml
+import glob
+import re
+from pathlib import Path
+from typing import Dict, List, Tuple, Set
+
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from generator import parse_extension_requirements, load_csrs
+
+# Named constants for fallback 12-bit signed immediate range.
+# The 12-bit signed immediate range is from -2048 to 2047 (inclusive).
+DEFAULT_12BIT_SIGNED_IMM_MIN = -2048
+DEFAULT_12BIT_SIGNED_IMM_MAX = 2047
+
+# Named constants for fallback 12-bit unsigned immediate range.
+# The 12-bit unsigned immediate range is from 0 to 4095 (inclusive).
+DEFAULT_12BIT_UNSIGNED_IMM_MIN = 0
+DEFAULT_12BIT_UNSIGNED_IMM_MAX = 4095
+
+# Maximum number of CSR example names to keep for replacements
+MAX_CSR_EXAMPLES = 10
+
+# Maximum length for sanitized extension filenames
+MAX_EXTENSION_NAME_LENGTH = 20
+
+
+def calculate_location_width(location) -> int:
+    """Calculate the total bit width from a location string or integer."""
+    if not location:
+        return 0
+
+    # Handle case where location is an integer (single bit)
+    if isinstance(location, int):
+        return 1
+
+    location_str = str(location)
+    total_width = 0
+    parts = location_str.split("|")
+
+    for part in parts:
+        part = part.strip()
+        if "-" in part:
+            try:
+                a, b = map(int, part.split("-"))
+                total_width += abs(a - b) + 1
+            except ValueError:
+                logging.debug(
+                    f"Could not parse bit range '{part}' in location '{location_str}'"
+                )
+                continue
+        else:
+            try:
+                int(part)
+                total_width += 1
+            except ValueError:
+                logging.debug(
+                    f"Could not parse bit '{part}' in location '{location_str}'"
+                )
+                continue
+
+    return total_width
+
+
+def extract_instruction_constraints(name: str, data: dict) -> dict:
+    """Extract constraints from instruction YAML data."""
+    constraints = {}
+
+    encoding = data.get("encoding", {})
+    variables = encoding.get("variables", [])
+
+    register_constraints = {}
+    immediate_constraints = {}
+
+    for var in variables:
+        var_name = var.get("name", "")
+        location = var.get("location", "")
+        not_value = var.get("not")
+        left_shift = var.get("left_shift", 0)
+        sign_extend = var.get("sign_extend", False)
+
+        if not var_name or not location:
+            continue
+
+        width = calculate_location_width(location)
+
+        # Determine if this is a register or immediate field
+        if var_name in ["xd", "xs1", "xs2", "xs3", "rd", "rs1", "rs2", "rs3"]:
+            register_constraints[var_name] = {
+                "width": width,
+                "not_value": not_value,
+                "location": location,
+            }
+        elif (
+            var_name in ["imm", "simm"]
+            or var_name.startswith("zimm")
+            or var_name.startswith("simm")
+        ):
+            # Calculate the logical immediate range
+            # Determine if signed or unsigned immediate
+            is_signed = (
+                sign_extend
+                or var_name.startswith("simm")
+                or (width == 12 and var_name == "imm")
+            )  # I-type pattern
+
+            if is_signed:
+                if width > 0:
+                    max_val = (1 << (width - 1)) - 1
+                    min_val = -(1 << (width - 1))
+                else:
+                    max_val, min_val = (
+                        DEFAULT_12BIT_SIGNED_IMM_MAX,
+                        DEFAULT_12BIT_SIGNED_IMM_MIN,
+                    )
+            else:
+                # Unsigned immediate - use full width
+                if width > 0:
+                    max_val = (1 << width) - 1
+                    min_val = 0
+                else:
+                    # Fallback to 12-bit unsigned immediate range when width is unknown.
+                    # Use named constants to make intent explicit.
+                    max_val, min_val = (
+                        DEFAULT_12BIT_UNSIGNED_IMM_MAX,
+                        DEFAULT_12BIT_UNSIGNED_IMM_MIN,
+                    )
+
+            immediate_constraints[var_name] = {
+                "range": (min_val, max_val),
+                "not_value": not_value,
+                "left_shift": left_shift,
+                "sign_extend": sign_extend,
+                "width": width,
+            }
+
+    if register_constraints or immediate_constraints:
+        constraints["registers"] = register_constraints
+        constraints["immediates"] = immediate_constraints
+
+    base = data.get("base")
+    if base:
+        constraints["architecture"] = base
+
+    if name.startswith("c."):
+        constraints["compressed"] = True
+        if "rs1'" in str(data) or "rd'" in str(data):
+            constraints["limited_registers"] = True
+
+    return constraints
+
+
+def sanitize_extension_name(name: str) -> str:
+    """Sanitize extension name to be a valid filename."""
+    sanitized = name.lower()
+    sanitized = re.sub(r'[{}\[\]\'",\s:]+', "-", sanitized)
+    sanitized = sanitized.strip("-")[:MAX_EXTENSION_NAME_LENGTH]
+    return sanitized if sanitized else "unknown"
+
+
+def _normalize_extension_token(token: str) -> List[str]:
+    token = (token or "").strip().lower()
+    if not token:
+        return []
+
+    if token.startswith("rv32") or token.startswith("rv64"):
+        token = token[4:]
+    elif token.startswith("rv"):
+        token = token[2:]
+
+    token = token.strip()
+    if not token:
+        return []
+
+    if token.startswith("z") or token.startswith("s") or token.startswith("x"):
+        return [token]
+
+    if len(token) > 1 and token.isalpha():
+        return list(token)
+
+    return [token]
+
+
+def extract_extension_names(defined_by) -> Set[str]:
+    extensions: Set[str] = set()
+
+    if isinstance(defined_by, str):
+        extensions.update(_normalize_extension_token(defined_by))
+    elif isinstance(defined_by, dict) and defined_by:
+        name = defined_by.get("name")
+        if name:
+            extensions.update(_normalize_extension_token(name))
+
+        for key in ("anyOf", "allOf", "oneOf"):
+            value = defined_by.get(key)
+            if isinstance(value, list):
+                for item in value:
+                    extensions.update(extract_extension_names(item))
+            elif value is not None:
+                extensions.update(extract_extension_names(value))
+
+    extensions.discard("i")
+    extensions.discard("")
+    extensions.discard("unknown")
+    return extensions
+
+
+# RISC-V ABI register definitions
+# TODO: Move to UDB specs
+RISCV_ABI_REGISTERS = {
+    "gpr": {
+        "arg_ret": ["a0", "a1", "a2", "a3", "a4", "a5", "a6", "a7"],
+        "temp": ["t0", "t1", "t2", "t3", "t4", "t5", "t6"],
+        "saved": [
+            "s0",
+            "s1",
+            "s2",
+            "s3",
+            "s4",
+            "s5",
+            "s6",
+            "s7",
+            "s8",
+            "s9",
+            "s10",
+            "s11",
+        ],
+        "special": ["zero", "ra", "sp", "gp", "tp"],
+    },
+    "fpr": {
+        "arg_ret": ["fa0", "fa1", "fa2", "fa3", "fa4", "fa5", "fa6", "fa7"],
+        "temp": [
+            "ft0",
+            "ft1",
+            "ft2",
+            "ft3",
+            "ft4",
+            "ft5",
+            "ft6",
+            "ft7",
+            "ft8",
+            "ft9",
+            "ft10",
+            "ft11",
+        ],
+        "saved": [
+            "fs0",
+            "fs1",
+            "fs2",
+            "fs3",
+            "fs4",
+            "fs5",
+            "fs6",
+            "fs7",
+            "fs8",
+            "fs9",
+            "fs10",
+            "fs11",
+        ],
+    },
+    "vpr": {
+        "general": [
+            "v0",
+            "v1",
+            "v2",
+            "v3",
+            "v4",
+            "v8",
+            "v12",
+            "v16",
+            "v20",
+            "v24",
+            "v28",
+        ]
+    },
+}
+
+RISCV_FP_ROUNDING_MODES = ["rne", "rtz", "rdn", "rup", "rmm"]
+RISCV_FENCE_ORDERING = ["rw", "r", "w", "iorw", "ior", "iow"]
+
+logging.basicConfig(level=logging.INFO, format="%(levelname)s:: %(message)s")
+
+
+class TestInstructionGroup:
+    """Represents a group of related instructions for test generation."""
+
+    def __init__(self, extension: str):
+        self.extension = extension
+        self.instructions = []
+        self.error_cases: Dict[str, dict] = {}
+        self.arch_specific = {"rv32": [], "rv64": []}
+        self.required_extensions: Set[str] = set()
+
+    def add_instruction(self, name: str, info: dict):
+        """Add an instruction to this group."""
+        self.instructions.append((name, info))
+
+        base = info.get("base")
+        if base == 32:
+            self.arch_specific["rv32"].append((name, info))
+        elif base == 64:
+            self.arch_specific["rv64"].append((name, info))
+
+        defined_by = info.get("definedBy")
+        if defined_by is not None:
+            self.required_extensions.update(extract_extension_names(defined_by))
+        if self.extension:
+            ext_lower = self.extension.lower()
+            if ext_lower != "unknown":
+                # Split hyphenated extension group names into individual extensions
+                for part in ext_lower.split("-"):
+                    normalized = _normalize_extension_token(part)
+                    self.required_extensions.update(normalized)
+
+    def add_error_case(
+        self,
+        instruction: str,
+        invalid_assembly: str,
+        error_msg: str,
+        *,
+        reason: str | None = None,
+        assembly: str | None = None,
+        display_instruction: str | None = None,
+    ) -> None:
+
+        entry = self.error_cases.setdefault(
+            instruction,
+            {
+                "assembly": assembly,
+                "display_instruction": display_instruction or instruction,
+                "cases": [],
+            },
+        )
+
+        if assembly and not entry.get("assembly"):
+            entry["assembly"] = assembly
+        if display_instruction:
+            entry["display_instruction"] = display_instruction
+
+        entry["cases"].append(
+            {
+                "line": invalid_assembly,
+                "error_msg": error_msg,
+                "reason": reason,
+            }
+        )
+
+
+class AssemblyExampleGenerator:
+    """Generates assembly examples"""
+
+    def __init__(
+        self,
+        csr_dir: str = "../../../spec/std/isa/csr/",
+        inst_dir: str = "../../../spec/std/isa/inst/",
+    ):
+        self.csr_dir = csr_dir
+        self.inst_dir = inst_dir
+
+        self._load_operand_definitions()
+        self._load_csr_examples()
+        self.all_instruction_data, self.instruction_constraints = (
+            self._load_all_instruction_data()
+        )
+        self.extension_classification = self._classify_extensions()
+
+    def _load_operand_definitions(self):
+        """Load operand type definitions from RISC-V ABI and architecture specs."""
+
+        abi_regs = RISCV_ABI_REGISTERS
+
+        self.gpr_examples = (
+            abi_regs["gpr"]["arg_ret"][:4] + abi_regs["gpr"]["saved"][:4]
+        )
+
+        # Compressed instruction register set per RISC-V spec
+        # 3-bit register fields (rs1', rs2', rd') encode registers x8-x15
+        # x8=s0, x9=s1, x10=a0, x11=a1, x12=a2, x13=a3, x14=a4, x15=a5
+        self.compressed_gpr_examples = (
+            abi_regs["gpr"]["saved"][:2]  # s0, s1 (x8, x9)
+            + abi_regs["gpr"]["arg_ret"][:6]  # a0-a5 (x10-x15)
+        )
+
+        self.fpr_examples = (
+            abi_regs["fpr"]["arg_ret"][:4]
+            + abi_regs["fpr"]["temp"][:3]
+            + abi_regs["fpr"]["saved"][:2]
+        )
+
+        self.vpr_examples = abi_regs["vpr"]["general"]
+
+        self.vector_mask_examples = ["", "v0.t"]
+
+        self.rounding_mode_examples = RISCV_FP_ROUNDING_MODES
+        self.fence_examples = RISCV_FENCE_ORDERING
+
+    def _load_csr_examples(self):
+        """Load CSR examples from the unified database."""
+        try:
+            csr_dict = load_csrs(
+                self.csr_dir,
+                enabled_extensions=[],
+                include_all=True,
+                target_arch="BOTH",
+            )
+            self.csr_examples = list(
+                {name.lower().replace(".rv32", "") for name in csr_dict.values()}
+            )[:MAX_CSR_EXAMPLES]
+        except Exception as e:
+            logging.warning(
+                f"Failed to load CSRs from {self.csr_dir}: {e}. Using fallback CSR list."
+            )
+            self.csr_examples = ["mstatus", "mtvec", "mscratch", "cycle", "time"]
+
+    def _load_all_instruction_data(self) -> Tuple[Dict[str, dict], Dict[str, dict]]:
+        instruction_data = {}
+        instruction_constraints = {}
+
+        yaml_files = glob.glob(os.path.join(self.inst_dir, "**/*.yaml"), recursive=True)
+
+        for yaml_file in yaml_files:
+            try:
+                with open(yaml_file, encoding="utf-8") as f:
+                    data = yaml.safe_load(f)
+
+                if not isinstance(data, dict) or data.get("kind") != "instruction":
+                    continue
+
+                name = data.get("name")
+                if not name:
+                    continue
+
+                instruction_data[name] = data
+
+                constraints = extract_instruction_constraints(name, data)
+                if constraints:
+                    instruction_constraints[name] = constraints
+
+            except Exception as e:
+                logging.debug(f"Error loading {yaml_file}: {e}")
+                continue
+
+        logging.debug(
+            f"Single-pass loaded {len(instruction_data)} instructions, {len(instruction_constraints)} constraints"
+        )
+        return instruction_data, instruction_constraints
+
+    def _classify_extensions(self) -> dict:
+        """Classify extensions based on actual data from the unified database, not hardcoded patterns."""
+        classification = {
+            "standard": set(),
+            "multi_standard": set(),
+            "z_extensions": set(),
+            "s_extensions": set(),
+            "x_extensions": set(),
+            "other": set(),
+        }
+
+        all_extensions = set()
+        for name, data in self.all_instruction_data.items():
+            defined_by = data.get("definedBy")
+            if defined_by:
+                if isinstance(defined_by, str):
+                    all_extensions.add(defined_by.lower())
+                elif isinstance(defined_by, dict):
+                    self._extract_extensions_from_complex(defined_by, all_extensions)
+
+        for ext in all_extensions:
+            ext_clean = ext.lower().strip()
+            if not ext_clean:
+                continue
+            if len(ext_clean) == 1 and ext_clean.isalpha():
+                classification["standard"].add(ext_clean)
+            elif ext_clean.startswith("z"):
+                classification["z_extensions"].add(ext_clean)
+            elif ext_clean.startswith("s"):
+                classification["s_extensions"].add(ext_clean)
+            elif ext_clean.startswith("x"):
+                classification["x_extensions"].add(ext_clean)
+            elif ext_clean.startswith("rv32") or ext_clean.startswith("rv64"):
+                base_ext = ext_clean[4:] if len(ext_clean) > 4 else "i"
+                if len(base_ext) == 1:
+                    classification["standard"].add(base_ext)
+                else:
+                    classification["multi_standard"].add(base_ext)
+            elif len(ext_clean) > 1:
+                classification["multi_standard"].add(ext_clean)
+            else:
+                classification["other"].add(ext_clean)
+
+        return classification
+
+    def _extract_extensions_from_complex(self, defined_by: dict, all_extensions: set):
+        if "anyOf" in defined_by:
+            for item in defined_by["anyOf"]:
+                if isinstance(item, str):
+                    all_extensions.add(item.lower())
+                elif isinstance(item, dict):
+                    self._extract_extensions_from_complex(item, all_extensions)
+
+        if "allOf" in defined_by:
+            for item in defined_by["allOf"]:
+                if isinstance(item, str):
+                    all_extensions.add(item.lower())
+                elif isinstance(item, dict):
+                    self._extract_extensions_from_complex(item, all_extensions)
+
+        if "oneOf" in defined_by:
+            for item in defined_by["oneOf"]:
+                if isinstance(item, str):
+                    all_extensions.add(item.lower())
+                elif isinstance(item, dict):
+                    self._extract_extensions_from_complex(item, all_extensions)
+
+    def _get_operand_replacements(
+        self, inst_name: str, assembly: str, variant_index: int
+    ) -> Dict[str, str]:
+        """Generate operand replacements based on instruction requirements"""
+        i = variant_index
+
+        constraints = self._get_instruction_constraints(inst_name)
+        if constraints.get("uses_compressed_regs") or constraints.get(
+            "limited_registers"
+        ):
+            reg_examples = self.compressed_gpr_examples
+        else:
+            reg_examples = self.gpr_examples
+
+        replacements = {
+            # GPR register patterns
+            "xd": reg_examples[i % len(reg_examples)],
+            "xs1": reg_examples[(i + 1) % len(reg_examples)],
+            "xs2": reg_examples[(i + 2) % len(reg_examples)],
+            "xs3": reg_examples[(i + 3) % len(reg_examples)],
+            "rd": reg_examples[i % len(reg_examples)],
+            "rs1": reg_examples[(i + 1) % len(reg_examples)],
+            "rs2": reg_examples[(i + 2) % len(reg_examples)],
+            "rs3": reg_examples[(i + 3) % len(reg_examples)],
+            # FPR register patterns
+            "fd": self.fpr_examples[i % len(self.fpr_examples)],
+            "fs1": self.fpr_examples[(i + 1) % len(self.fpr_examples)],
+            "fs2": self.fpr_examples[(i + 2) % len(self.fpr_examples)],
+            "fs3": self.fpr_examples[(i + 3) % len(self.fpr_examples)],
+            # Vector register patterns
+            "vd": self.vpr_examples[i % len(self.vpr_examples)],
+            "vs1": self.vpr_examples[(i + 1) % len(self.vpr_examples)],
+            "vs2": self.vpr_examples[(i + 2) % len(self.vpr_examples)],
+            "vs3": self.vpr_examples[(i + 3) % len(self.vpr_examples)],
+            # Vector mask
+            "vm": self.vector_mask_examples[i % len(self.vector_mask_examples)],
+            # CSR patterns
+            "csr": self.csr_examples[i % len(self.csr_examples)],
+            # Immediate patterns
+            "imm": str(
+                self._get_safe_immediate(
+                    inst_name, self._get_instruction_constraints(inst_name)
+                )
+            ),
+            "simm": str(
+                self._get_safe_immediate(
+                    inst_name, self._get_instruction_constraints(inst_name)
+                )
+            ),
+            "zimm": str(
+                abs(
+                    self._get_safe_immediate(
+                        inst_name, self._get_instruction_constraints(inst_name)
+                    )
+                )
+            ),
+            "shamt": str(1 + i),
+            "offset": str((i + 1) * 4),
+        }
+
+        constraints = self._get_instruction_constraints(inst_name)
+
+        # Use constraint-based immediate generation for all instructions
+        if "imm_range" in constraints:
+            min_val, max_val = constraints["imm_range"]
+            imm_multiple = constraints.get("imm_multiple", 1)
+            imm_not_zero = constraints.get("imm_not_zero", False)
+
+            safe_imm = self._get_safe_immediate_from_constraints(
+                min_val, max_val, imm_multiple, imm_not_zero, i
+            )
+
+            for key in ["imm", "simm", "zimm"]:
+                if key in replacements:
+                    replacements[key] = str(safe_imm)
+
+        return replacements
+
+    def _get_safe_immediate_from_constraints(
+        self, min_val: int, max_val: int, multiple: int, not_zero: bool, variant: int
+    ) -> int:
+        """Generate a safe immediate value that satisfies the given constraints."""
+        candidates = [1, 2, 4, 8, 16, 32, -1, -2, -4]
+
+        candidates = [c + variant for c in candidates] + candidates
+
+        for candidate in candidates:
+            if (
+                min_val <= candidate <= max_val
+                and candidate % multiple == 0
+                and (not not_zero or candidate != 0)
+            ):
+                return candidate
+
+        if multiple > 1:
+            start = ((min_val + multiple - 1) // multiple) * multiple
+            if not_zero and start == 0:
+                start = multiple
+            if start <= max_val:
+                return start
+
+        return (
+            min_val
+            if not not_zero or min_val != 0
+            else (min_val + 1 if min_val + 1 <= max_val else max_val)
+        )
+
+    def generate_examples(self, name: str, assembly: str) -> List[str]:
+        """Generate assembly examples using YAML assembly field as the authoritative source."""
+        instruction_data = self.all_instruction_data.get(name, {})
+        actual_assembly = instruction_data.get("assembly", assembly)
+
+        if actual_assembly:
+            assembly = actual_assembly
+
+        if not assembly or not assembly.strip():
+            return []
+
+        examples = []
+
+        if "," in assembly or any(
+            reg in assembly for reg in ["rd", "rs1", "rs2", "imm"]
+        ):
+            examples.extend(self._generate_variants(name, assembly))
+        else:
+            variants = self._generate_variants(name, assembly)
+            if variants:
+                examples.extend(variants)
+            else:
+                examples.append(f"{name}")
+
+        return examples
+
+    def _generate_variants(self, name: str, assembly: str) -> List[str]:
+        """Generate multiple assembly variants using the YAML assembly field."""
+        variants = []
+
+        instruction_data = self.all_instruction_data.get(name, {})
+        actual_assembly = instruction_data.get("assembly", assembly)
+
+        if actual_assembly and actual_assembly != assembly:
+            assembly = actual_assembly
+
+        if not assembly or not assembly.strip():
+            return []
+
+        reg_set = (
+            self.compressed_gpr_examples if name.startswith("c.") else self.gpr_examples
+        )
+
+        for i in range(min(3, len(reg_set) - 1)):
+            example = f"{name}\t{assembly}"
+
+            replacements = self._get_operand_replacements(name, assembly, i)
+
+            operands = self._parse_assembly_operands(assembly)
+            for operand in operands:
+                operand_type = operand.get("type")
+                operand_raw = operand.get("raw")
+
+                if operand_type == "rounding_mode" or operand_raw == "rm":
+                    replacements["rm"] = self.rounding_mode_examples[
+                        i % len(self.rounding_mode_examples)
+                    ]
+                elif operand_type == "fence_ordering" or operand_raw in [
+                    "pred",
+                    "succ",
+                ]:
+                    if operand_raw == "pred":
+                        replacements["pred"] = self.fence_examples[
+                            i % len(self.fence_examples)
+                        ]
+                    elif operand_raw == "succ":
+                        replacements["succ"] = self.fence_examples[
+                            (i + 1) % len(self.fence_examples)
+                        ]
+
+            operands = self._parse_assembly_operands(assembly)
+
+            for placeholder, value in replacements.items():
+                operand_found = any(
+                    op.get("raw") == placeholder
+                    or op.get("type") in ["csr", "vector_mask"]
+                    and placeholder in ["csr", "vm"]
+                    for op in operands
+                )
+
+                if not operand_found and placeholder not in assembly:
+                    continue
+
+                if placeholder == "csr":
+                    example = re.sub(r"\bcsr\b", value, example)
+                elif placeholder == "vm":
+                    # Vector mask is special because it's either empty (unmasked) or v0.t (masked)
+                    if value:
+                        example = re.sub(r"\bvm\b", value, example)
+                    else:
+                        example = re.sub(r",\s*\bvm\b", "", example)
+                        example = re.sub(r"\bvm\b,?\s*", "", example)
+                else:
+                    example = example.replace(placeholder, value)
+
+            for operand in operands:
+                if operand.get("type") == "memory":
+                    base_placeholder = operand.get("raw")
+                    if "(base)" in base_placeholder:
+                        base_reg = reg_set[i % len(reg_set)]
+                        example = example.replace("(base)", f"({base_reg})")
+                elif operand.get("type") == "memory_sp":
+                    if "(sp)" in example:
+                        continue
+
+            variants.append(example)
+
+        return variants
+
+    def _parse_assembly_operands(self, assembly: str) -> List[Dict]:
+        """Parse assembly string to identify operand types."""
+        operands = []
+
+        if not assembly or not assembly.strip():
+            return []
+        parts = [p.strip() for p in assembly.split(",") if p.strip()]
+
+        for part in parts:
+            operand_info = {"raw": part}
+            if "(" in part and ")" in part:
+                match = re.match(r"([^(]*)\(([^)]+)\)", part)
+                if match:
+                    offset, base = match.groups()
+                    base_reg = base.strip()
+
+                    if base_reg == "sp":
+                        operand_info.update({"type": "memory_sp", "offset": "4"})
+                    else:
+                        operand_info.update(
+                            {
+                                "type": "memory",
+                                "offset": "0",
+                                "base": self.gpr_examples[1],
+                            }
+                        )
+                else:
+                    operand_info["type"] = "unknown"
+            elif part in ["imm", "zimm", "simm"]:
+                operand_info["type"] = "immediate"
+            elif part in ["rd", "rs1", "rs2", "rs3", "xd", "xs1", "xs2", "xs3"]:
+                operand_info["type"] = "gpr"
+            elif part in ["fd", "fs1", "fs2", "fs3"]:
+                operand_info["type"] = "fpr"
+            elif part in ["vd", "vs1", "vs2", "vs3"]:
+                operand_info["type"] = "vpr"
+            elif part == "vm":
+                operand_info["type"] = "vector_mask"
+            elif part == "csr":
+                operand_info["type"] = "csr"
+            elif part in ["pred", "succ", "aq", "rl"]:
+                operand_info["type"] = "fence_ordering"
+            elif part in ["rm"]:
+                operand_info["type"] = "rounding_mode"
+            elif part in ["shamt", "shamtw"] or part.startswith("shamt"):
+                operand_info["type"] = "shift_amount"
+            elif part in [
+                "zimm5",
+                "zimm6",
+                "zimm10",
+                "zimm11",
+                "zimm12",
+            ] or part.startswith(("zimm", "simm")):
+                operand_info["type"] = "immediate"
+            else:
+                if part.startswith(("x", "a", "t", "s")):
+                    operand_info["type"] = "gpr"
+                elif part.startswith(("f", "fa", "ft", "fs")):
+                    operand_info["type"] = "fpr"
+                elif part.startswith("v"):
+                    operand_info["type"] = "vpr"
+                else:
+                    operand_info["type"] = "unknown"
+
+            operands.append(operand_info)
+
+        return operands
+
+    def _get_instruction_constraints(self, name: str) -> dict:
+        """Get instruction-specific constraints from loaded database."""
+        raw_constraints = self.instruction_constraints.get(name, {})
+
+        processed_constraints = {}
+
+        immediates = raw_constraints.get("immediates", {})
+        for imm_name, imm_data in immediates.items():
+            if imm_name == "imm" or imm_name.startswith(("simm", "zimm")):
+                min_val, max_val = imm_data["range"]
+                processed_constraints["imm_range"] = (min_val, max_val)
+
+                if imm_data.get("not_value") == 0:
+                    processed_constraints["imm_not_zero"] = True
+
+                left_shift = imm_data.get("left_shift", 0)
+                if left_shift > 0:
+                    processed_constraints["imm_multiple"] = 1 << left_shift
+
+                break
+
+        registers = raw_constraints.get("registers", {})
+        if registers:
+            processed_constraints["registers"] = registers
+
+            for reg_name, reg_data in registers.items():
+                if reg_name.endswith("'") or reg_data.get("width") == 3:
+                    processed_constraints["uses_compressed_regs"] = True
+                    break
+
+        if raw_constraints.get("compressed"):
+            processed_constraints["compressed"] = True
+
+        if raw_constraints.get("limited_registers"):
+            processed_constraints["limited_registers"] = True
+
+        return processed_constraints
+
+    def _get_safe_immediate(self, name: str, constraints: dict) -> int:
+        """Get a safe immediate value that satisfies instruction constraints."""
+        imm_range = constraints.get(
+            "imm_range", (DEFAULT_12BIT_SIGNED_IMM_MIN, DEFAULT_12BIT_SIGNED_IMM_MAX)
+        )
+        imm_multiple = constraints.get("imm_multiple", 1)
+        imm_not_zero = constraints.get("imm_not_zero", False)
+
+        min_val, max_val = imm_range
+
+        candidates = []
+
+        if imm_multiple > 1:
+            start = ((min_val + imm_multiple - 1) // imm_multiple) * imm_multiple
+            if imm_not_zero and start == 0:
+                start = imm_multiple
+
+            for i in range(6):
+                candidate = start + (i * imm_multiple)
+                if candidate <= max_val:
+                    candidates.append(candidate)
+                neg_candidate = start - ((i + 1) * imm_multiple)
+                if neg_candidate >= min_val and neg_candidate != 0:
+                    candidates.append(neg_candidate)
+        else:
+            if min_val >= 0:
+                # Unsigned range
+                candidates = [
+                    min_val,
+                    min_val + 1,
+                    min_val + 2,
+                    min_val + 4,
+                    min_val + 8,
+                ]
+                candidates.extend([max_val, max_val - 1, max_val - 2])
+            else:
+                # Signed range
+                candidates = [1, 2, 4, 8, 16, -1, -2, -4, -8]
+                candidates.extend([max_val, max_val - 1, min_val, min_val + 1])
+
+        candidates = list(set(candidates))
+
+        for candidate in candidates:
+            if (
+                min_val <= candidate <= max_val
+                and candidate % imm_multiple == 0
+                and (not imm_not_zero or candidate != 0)
+            ):
+                return candidate
+
+        if imm_multiple > 1:
+            start = ((min_val + imm_multiple - 1) // imm_multiple) * imm_multiple
+            if imm_not_zero and start == 0:
+                start = imm_multiple
+            if start <= max_val:
+                return start
+
+        return min_val if not imm_not_zero or min_val != 0 else min_val + 1
+
+
+class GasTestGenerator:
+    """Main class for generating GNU Assembler test files."""
+
+    def __init__(
+        self,
+        output_dir: str = "gas_tests",
+        csr_dir: str = "../../../spec/std/isa/csr/",
+        inst_dir: str = "../../../spec/std/isa/inst/",
+    ):
+        self.output_dir = Path(output_dir)
+        self.example_generator = AssemblyExampleGenerator(csr_dir, inst_dir)
+        self.output_dir.mkdir(exist_ok=True)
+
+    def load_instructions(
+        self,
+        inst_dir: str,
+        enabled_extensions: List[str] = None,
+        include_all: bool = False,
+    ) -> Dict[str, dict]:
+        """Load instructions from the unified database using precomputed data"""
+        if enabled_extensions is None:
+            enabled_extensions = []
+
+        all_instructions = self.example_generator.all_instruction_data
+
+        if include_all:
+            logging.info(f"Using all {len(all_instructions)} precomputed instructions")
+            return all_instructions
+
+        filtered_instructions = {}
+
+        for name, data in all_instructions.items():
+            defined_by = data.get("definedBy")
+            if defined_by:
+                try:
+                    meets_req = parse_extension_requirements(defined_by)
+                    if meets_req(enabled_extensions):
+                        filtered_instructions[name] = data
+                except Exception:
+                    continue
+            else:
+                filtered_instructions[name] = data
+
+        logging.info(
+            f"Filtered to {len(filtered_instructions)} instructions from precomputed data"
+        )
+        return filtered_instructions
+
+    def group_instructions_by_extension(
+        self, instructions: Dict[str, dict]
+    ) -> Dict[str, TestInstructionGroup]:
+        """Group instructions by their defining extension."""
+        groups = {}
+
+        for name, info in instructions.items():
+            defined_by = info.get("definedBy", "I")
+            ext_name = self._extract_extension_name(defined_by)
+            if ext_name not in groups:
+                groups[ext_name] = TestInstructionGroup(ext_name)
+
+            groups[ext_name].add_instruction(name, info)
+
+        return groups
+
+    def _extract_extension_name(self, defined_by) -> str:
+        """Extract a clean extension name from definedBy field using consistent logic."""
+        if isinstance(defined_by, str):
+            if defined_by.startswith("RV"):
+                if defined_by.startswith("RV32") or defined_by.startswith("RV64"):
+                    return defined_by[4:].lower() if len(defined_by) > 4 else "i"
+                else:
+                    return defined_by[2:].lower() if len(defined_by) > 2 else "i"
+            return defined_by.lower()
+        elif isinstance(defined_by, dict):
+            return self._extract_from_complex_definition(defined_by)
+        else:
+            return sanitize_extension_name(str(defined_by))
+
+    def _extract_from_complex_definition(self, defined_by: dict) -> str:
+        """Extract extension name from complex definedBy structures."""
+        if "anyOf" in defined_by:
+            any_of_list = defined_by["anyOf"]
+            if any_of_list and len(any_of_list) > 0:
+                first_item = any_of_list[0]
+                if isinstance(first_item, str):
+                    return first_item.lower()
+                elif isinstance(first_item, dict) and "allOf" in first_item:
+                    all_of_list = first_item["allOf"]
+                    if all_of_list and len(all_of_list) > 0:
+
+                        extensions = [
+                            ext.lower() for ext in all_of_list if isinstance(ext, str)
+                        ]
+                        return "-".join(extensions) if extensions else "unknown"
+                return sanitize_extension_name(str(first_item))
+
+        elif "allOf" in defined_by:
+            all_of_list = defined_by["allOf"]
+            if all_of_list and len(all_of_list) > 0:
+                extensions = [
+                    ext.lower() for ext in all_of_list if isinstance(ext, str)
+                ]
+                return "-".join(extensions) if extensions else "unknown"
+
+        elif "oneOf" in defined_by:
+            one_of_list = defined_by["oneOf"]
+            if one_of_list and len(one_of_list) > 0:
+                first_ext = one_of_list[0]
+                if isinstance(first_ext, str):
+                    return first_ext.lower()
+                return sanitize_extension_name(str(first_ext))
+
+        elif "name" in defined_by:
+            return defined_by["name"].lower()
+
+        return sanitize_extension_name(str(defined_by))
+
+    def generate_tests_for_group(self, group: TestInstructionGroup) -> None:
+        """Generate test files for a group of related instructions."""
+        if not group.instructions:
+            return
+
+        self._generate_main_tests(group)
+
+        if group.arch_specific["rv64"]:
+            self._generate_arch_specific_tests(group, "rv64")
+
+        self._generate_error_tests(group)
+
+        if len(group.instructions) > 5:
+            self._generate_no_alias_tests(group)
+
+    def _generate_main_tests(self, group: TestInstructionGroup) -> None:
+        """Generate the main .s and .d test files for a group."""
+        ext_name = self._get_binutils_filename(group.extension)
+
+        main_instructions = []
+        for name, info in group.instructions:
+            base = info.get("base")
+            if base is None or base == 32:
+                main_instructions.append((name, info))
+
+        # Generate assembly source file
+        source_file = self.output_dir / f"{ext_name}.s"
+        dump_file = self.output_dir / f"{ext_name}.d"
+
+        instruction_examples: List[Tuple[str, str, str]] = []
+        for name, info in main_instructions:
+            assembly = info.get("assembly", "")
+            examples = self.example_generator.generate_examples(name, assembly)
+            primary_example = self._select_primary_example(examples)
+            if not primary_example:
+                continue
+            instruction_examples.append((name, assembly, primary_example))
+
+        with open(source_file, "w") as f:
+            f.write("target:\n")
+
+            for name, assembly, example in instruction_examples:
+                mnemonic, _ = self._split_example_line(example)
+                signature = assembly.strip() if assembly else "n/a"
+                f.write(
+                    f"\t# Auto-generated pass test for `{mnemonic}` (assembly: {signature})\n"
+                )
+                f.write("\t# This source should assemble successfully.\n")
+                f.write(f"\t{example}\n\n")
+
+        base_arch = "rv32i"
+        march = self._build_march_string(
+            base_arch, group.extension, group.required_extensions
+        )
+
+        with open(dump_file, "w") as f:
+            f.write(f"#as: -march={march}\n")
+            f.write(f"#source: {source_file.name}\n")
+            f.write("#objdump: -d -M no-aliases\n")
+            f.write("\n")
+            f.write(".*:[ \t]+file format .*\n")
+            f.write("\n")
+            f.write("\n")
+            f.write("Disassembly of section .text:\n")
+            f.write("\n")
+            f.write("0+000 <target>:\n")
+
+            addr = 0
+            for name, _, example in instruction_examples:
+                pattern = self._create_disasm_pattern(addr, name, example)
+                f.write(f"{pattern}\n")
+                addr += self._get_instruction_size(name)
+
+    def _generate_arch_specific_tests(
+        self, group: TestInstructionGroup, arch: str
+    ) -> None:
+        """Generate architecture-specific test files."""
+        ext_name = self._get_binutils_filename(group.extension)
+
+        source_file = self.output_dir / f"{ext_name}-{arch[2:]}.s"
+        dump_file = self.output_dir / f"{ext_name}-{arch[2:]}.d"
+
+        arch_instructions = group.arch_specific[arch]
+        if not arch_instructions:
+            return
+
+        instruction_examples: List[Tuple[str, str, str]] = []
+        for name, info in arch_instructions:
+            assembly = info.get("assembly", "")
+            examples = self.example_generator.generate_examples(name, assembly)
+            primary_example = self._select_primary_example(examples)
+            if not primary_example:
+                continue
+            instruction_examples.append((name, assembly, primary_example))
+
+        with open(source_file, "w") as f:
+            f.write("target:\n")
+
+            for name, assembly, example in instruction_examples:
+                mnemonic, _ = self._split_example_line(example)
+                signature = assembly.strip() if assembly else "n/a"
+                f.write(
+                    f"\t# Auto-generated pass test for `{mnemonic}` (assembly: {signature})\n"
+                )
+                f.write(
+                    f"\t# This source should assemble successfully on {arch.upper()}.\n"
+                )
+                f.write(f"\t{example}\n\n")
+
+        base_arch = f"{arch}i"
+        march = self._build_march_string(
+            base_arch, group.extension, group.required_extensions
+        )
+
+        with open(dump_file, "w") as f:
+            f.write(f"#as: -march={march}\n")
+            f.write(f"#source: {source_file.name}\n")
+            f.write("#objdump: -d -M no-aliases\n")
+            f.write("\n")
+            f.write(".*:[ \t]+file format .*\n")
+            f.write("\n")
+            f.write("\n")
+            f.write("Disassembly of section .text:\n")
+            f.write("\n")
+            f.write("0+000 <target>:\n")
+
+            addr = 0
+            for name, _, example in instruction_examples:
+                pattern = self._create_disasm_pattern(addr, name, example)
+                f.write(f"{pattern}\n")
+                addr += self._get_instruction_size(name)
+
+    def _generate_error_tests(self, group: TestInstructionGroup) -> None:
+        """Generate negative test cases for error conditions."""
+        ext_name = self._get_binutils_filename(group.extension)
+
+        source_file = self.output_dir / f"{ext_name}-fail.s"
+        dump_file = self.output_dir / f"{ext_name}-fail.d"
+        error_file = self.output_dir / f"{ext_name}-fail.l"
+
+        self._generate_common_error_cases(group)
+
+        if not group.error_cases:
+            logging.debug(f"No error cases generated for extension {group.extension}")
+            return
+
+        with open(source_file, "w") as f:
+            f.write("target:\n")
+
+            for name, _ in group.instructions:
+                entry = group.error_cases.get(name)
+                if not entry or not entry.get("cases"):
+                    continue
+
+                mnemonic = entry.get("display_instruction", name)
+                signature = entry.get("assembly") or "n/a"
+                f.write(
+                    f"\t# Auto-generated FAIL tests for `{mnemonic}` (assembly: {signature})\n"
+                )
+                f.write(
+                    "\t# Each line below is intended to fail assembly for a distinct reason.\n"
+                )
+
+                for case in entry["cases"]:
+                    reason = case.get("reason") or "generated error case"
+                    f.write(f"\t# FAIL: {reason}\n")
+                    f.write(f"\t{case['line']}\n")
+
+                f.write("\n")
+
+        with open(dump_file, "w") as f:
+            march = self._build_march_string(
+                "rv32i", group.extension, group.required_extensions
+            )
+            f.write(f"#as: -march={march}\n")
+            f.write(f"#source: {source_file.name}\n")
+            f.write(f"#error_output: {error_file.name}\n")
+
+        with open(error_file, "w") as f:
+            f.write(".*: Assembler messages:\n")
+            for name, _ in group.instructions:
+                entry = group.error_cases.get(name)
+                if not entry:
+                    continue
+                for case in entry["cases"]:
+                    f.write(f".*: Error: {case['error_msg']}\n")
+
+    def _generate_no_alias_tests(self, group: TestInstructionGroup) -> None:
+        """Generate tests with no-aliases option for detailed disassembly."""
+        ext_name = self._get_binutils_filename(group.extension)
+
+        dump_file = self.output_dir / f"{ext_name}-na.d"
+        source_file = f"{ext_name}.s"
+
+        main_instructions = []
+        for name, info in group.instructions:
+            base = info.get("base")
+            if base is None:
+                main_instructions.append((name, info))
+
+        instruction_examples: List[Tuple[str, str]] = []
+        for name, info in main_instructions:
+            assembly = info.get("assembly", "")
+            examples = self.example_generator.generate_examples(name, assembly)
+            primary_example = self._select_primary_example(examples)
+            if not primary_example:
+                continue
+            instruction_examples.append((name, primary_example))
+
+        with open(dump_file, "w") as f:
+            march = self._build_march_string(
+                "rv32i", group.extension, group.required_extensions
+            )
+            f.write(f"#as: -march={march}\n")
+            f.write(f"#source: {source_file}\n")
+            f.write("#objdump: -d -M no-aliases\n")
+            f.write("\n")
+            f.write(".*:[ \t]+file format .*\n")
+            f.write("\n")
+            f.write("Disassembly of section .text:\n")
+            f.write("\n")
+            f.write("0+000 <target>:\n")
+
+            addr = 0
+            for name, example in instruction_examples:
+                pattern = self._create_disasm_pattern(
+                    addr, name, example, no_aliases=True
+                )
+                f.write(f"{pattern}\n")
+                addr += self._get_instruction_size(name)
+
+    def _format_error_operands(self, operands: str) -> str:
+        sanitized = re.sub(r"\s+", " ", operands.strip())
+        sanitized = re.sub(r"\s*,\s*", ",", sanitized)
+        return sanitized
+
+    def _generate_common_error_cases(self, group: TestInstructionGroup) -> None:
+
+        group.error_cases = {}
+
+        for name, info in group.instructions:
+            assembly = info.get("assembly", "")
+            examples = self.example_generator.generate_examples(name, assembly)
+            primary_example = self._select_primary_example(examples)
+            if not primary_example:
+                continue
+
+            mnemonic, operands = self._split_example_line(primary_example)
+            assembly_tokens = (
+                [token.strip() for token in assembly.split(",")] if assembly else []
+            )
+
+            cases = self._create_standard_fail_cases(
+                mnemonic, operands, assembly_tokens
+            )
+            if not cases:
+                continue
+
+            for case in cases:
+                group.add_error_case(
+                    name,
+                    case["line"],
+                    case["error_msg"],
+                    reason=case["reason"],
+                    assembly=assembly,
+                    display_instruction=mnemonic,
+                )
+
+        if not group.error_cases and group.instructions:
+            fallback_name, info = group.instructions[0]
+            assembly = info.get("assembly", "")
+            mnemonic = fallback_name
+            operands = ["x32", "x0"]
+            line = self._format_instruction_line(mnemonic, operands)
+            formatted_operands = self._format_error_operands(", ".join(operands))
+            error_msg = f"illegal operands `{mnemonic} {formatted_operands}'"
+            group.add_error_case(
+                fallback_name,
+                line,
+                error_msg,
+                reason="generic invalid operand",
+                assembly=assembly,
+                display_instruction=mnemonic,
+            )
+
+    def _select_primary_example(self, examples: List[str]) -> str | None:
+        for example in examples:
+            if example and example.strip():
+                return example.strip()
+        return None
+
+    def _split_example_line(self, example: str) -> Tuple[str, List[str]]:
+        stripped = example.strip()
+        if "\t" in stripped:
+            mnemonic, operand_str = stripped.split("\t", 1)
+        elif " " in stripped:
+            mnemonic, operand_str = stripped.split(" ", 1)
+        else:
+            return stripped, []
+
+        operands = [op.strip() for op in operand_str.split(",") if op.strip()]
+        return mnemonic.strip(), operands
+
+    def _format_instruction_line(self, mnemonic: str, operands: List[str]) -> str:
+        if operands:
+            return f"{mnemonic} {', '.join(operands)}"
+        return mnemonic
+
+    def _operand_is_register(self, operand: str) -> bool:
+        op = operand.strip()
+        if not op:
+            return False
+        if op.startswith(("-", "0x", "0b", "0d")):
+            return False
+        if op[0].isdigit():
+            return False
+        if "(" in op or ")" in op:
+            return False
+        if op.startswith("%"):
+            return False
+        return bool(re.match(r"[A-Za-z_][A-Za-z0-9_'.]*$", op))
+
+    def _choose_extra_operand(self, exemplar: str, role: str) -> str:
+        if role and "csr" in role.lower():
+            return "nonexistent"
+        return "x0" if self._operand_is_register(exemplar) else "1"
+
+    def _make_wrong_operand(self, operand: str, role: str) -> str | None:
+        if role and "csr" in role.lower():
+            return "nonexistent"
+        if self._operand_is_register(operand):
+            return "1"
+        return "x0"
+
+    def _create_standard_fail_cases(
+        self,
+        mnemonic: str,
+        operands: List[str],
+        assembly_tokens: List[str],
+    ) -> List[Dict[str, str]]:
+        cases: List[Dict[str, str]] = []
+        seen_lines: Set[str] = set()
+
+        def add_case(
+            reason: str, new_operands: List[str], *, custom_error: str | None = None
+        ) -> None:
+            line = self._format_instruction_line(mnemonic, new_operands)
+            if line in seen_lines:
+                return
+            seen_lines.add(line)
+
+            if custom_error is not None:
+                error_msg = custom_error
+            else:
+                operands_str = ", ".join(new_operands)
+                if operands_str:
+                    formatted = self._format_error_operands(operands_str)
+                    error_msg = f"illegal operands `{mnemonic} {formatted}'"
+                else:
+                    error_msg = f"illegal operands `{mnemonic}'"
+
+            cases.append(
+                {
+                    "reason": reason,
+                    "line": line,
+                    "error_msg": error_msg,
+                }
+            )
+
+        if operands:
+            few_operands = operands[:1] if len(operands) > 1 else []
+            add_case("wrong number of operands (too few)", few_operands)
+
+            last_role = assembly_tokens[-1] if assembly_tokens else ""
+            extra_operand = self._choose_extra_operand(operands[-1], last_role)
+            add_case("wrong number of operands (too many)", operands + [extra_operand])
+
+            for idx, operand in enumerate(operands):
+                role = assembly_tokens[idx] if idx < len(assembly_tokens) else ""
+                wrong_operand = self._make_wrong_operand(operand, role)
+                if not wrong_operand or wrong_operand == operand:
+                    continue
+
+                replaced = operands.copy()
+                replaced[idx] = wrong_operand
+
+                if role and "csr" in role.lower():
+                    add_case(
+                        "unknown CSR operand",
+                        replaced,
+                        custom_error="unknown CSR `nonexistent'",
+                    )
+                else:
+                    add_case(
+                        f"wrong operand type at position {idx + 1}",
+                        replaced,
+                    )
+        else:
+            add_case("wrong number of operands (too many)", ["x0"])
+
+        return cases
+
+    def _get_instruction_size(self, name: str) -> int:
+        """
+        Determine instruction size in bytes.
+        Compressed instructions (C extension) are 2 bytes, standard instructions are 4 bytes.
+        """
+        # Compressed instructions start with 'c.'
+        if name.startswith("c."):
+            return 2
+        return 4
+
+    def _get_binutils_filename(self, extension: str) -> str:
+        """Get binutils-style filename for extension."""
+        ext = extension.lower()
+        if "-" in ext:
+            ext_parts = ext.split("-")
+            classification = self.example_generator.extension_classification
+            standard_exts = classification["standard"]
+
+            if all(part in standard_exts for part in ext_parts):
+                return "-".join(sorted(ext_parts))
+            else:
+                return ext
+
+        return ext
+
+    def _build_march_string(
+        self, base_arch: str, extension: str, extra_extensions: Set[str] | None = None
+    ) -> str:
+        classification = self.example_generator.extension_classification
+        standard_exts = classification["standard"]
+
+        # Extensions that should not appear in march strings
+        # These work with any base ISA and don't need explicit march flags
+        excluded_march_extensions = {
+            "s",  # Supervisor mode (sfence.vma, etc.) - part of base privileged spec
+            "sm",  # Supervisor mode (empty group, parent of Sm* extensions)
+            "sdext",  # Debug extension (dret) - doesn't need march flag
+            "xmock",  # Test/mock extension - not real
+        }
+
+        extensions: Set[str] = set()
+
+        for part in extension.lower().split("-"):
+            extensions.update(_normalize_extension_token(part))
+
+        if extra_extensions:
+            for extra in extra_extensions:
+                extensions.update(_normalize_extension_token(extra))
+
+        # Filter out 'i' and excluded privileged extensions
+        extensions = {
+            ext
+            for ext in extensions
+            if ext and ext != "i" and ext not in excluded_march_extensions
+        }
+
+        standard_parts = sorted(
+            ext for ext in extensions if ext in standard_exts and len(ext) == 1
+        )
+        non_standard_parts = sorted(
+            ext for ext in extensions if ext not in standard_exts or len(ext) > 1
+        )
+
+        march = base_arch
+
+        if standard_parts:
+            march += "".join(standard_parts)
+
+        if non_standard_parts:
+            march += "_" + "_".join(non_standard_parts)
+
+        return march
+
+    def _create_disasm_pattern(
+        self, addr: int, name: str, example: str, no_aliases: bool = False
+    ) -> str:
+        """Create a regex pattern for expected disassembly output."""
+        line = example.strip()
+
+        while line.startswith("\t"):
+            line = line[1:]
+
+        if not line:
+            instr = name
+            operands = ""
+        else:
+            parts = re.split(r"\s+", line, maxsplit=1)
+            instr = parts[0]
+            operands = parts[1] if len(parts) > 1 else ""
+
+        # Format: address: hex_code instruction operands
+        pattern = f"[ \t]+[0-9a-f]+:[ \t]+[0-9a-f]+[ \t]+{re.escape(instr)}"
+
+        if operands:
+            operands_clean = re.sub(r"\s+", " ", operands.strip())
+
+            def _escape_with_whitespace(token: str) -> str:
+                escaped = re.escape(token)
+                return escaped.replace(r"\ ", r"\\s+")
+
+            if "," in operands_clean:
+                pieces = [
+                    _escape_with_whitespace(part.strip())
+                    for part in operands_clean.split(",")
+                ]
+                operands_pattern = r"\s*,\s*".join(pieces)
+            else:
+                operands_pattern = _escape_with_whitespace(operands_clean)
+
+            pattern += f"[ \t]+{operands_pattern}"
+
+        return pattern
+
+    def generate_all_tests(self, instructions: Dict[str, dict]) -> None:
+        groups = self.group_instructions_by_extension(instructions)
+
+        logging.info(f"Generating tests for {len(groups)} instruction groups")
+
+        for ext_name, group in groups.items():
+            logging.info(f"Generating tests for extension: {ext_name}")
+            self.generate_tests_for_group(group)
+
+        logging.info(f"Test generation complete. Files written to: {self.output_dir}")
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description="Generate GNU Assembler test files from RISC-V unified database"
+    )
+    parser.add_argument(
+        "--inst-dir",
+        default="../../../spec/std/isa/inst/",
+        help="Directory containing instruction YAML files",
+    )
+    parser.add_argument(
+        "--csr-dir",
+        default="../../../spec/std/isa/csr/",
+        help="Directory containing CSR YAML files",
+    )
+    parser.add_argument(
+        "--output-dir",
+        default="gas_tests",
+        help="Output directory for generated test files",
+    )
+    parser.add_argument(
+        "--extensions", help="Comma-separated list of enabled extensions (default: all)"
+    )
+    parser.add_argument(
+        "--verbose", "-v", action="store_true", help="Enable verbose logging"
+    )
+    parser.add_argument(
+        "--include-all",
+        "-a",
+        action="store_true",
+        help="Include all instructions, ignoring extension filtering",
+    )
+
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+
+    if args.include_all or not args.extensions:
+        enabled_extensions = []
+        include_all = True
+        logging.info("Including all instructions")
+    else:
+        enabled_extensions = [
+            ext.strip() for ext in args.extensions.split(",") if ext.strip()
+        ]
+        include_all = False
+        logging.info(f"Enabled extensions: {', '.join(enabled_extensions)}")
+
+    if not os.path.isdir(args.inst_dir):
+        logging.error(f"Instruction directory not found: {args.inst_dir}")
+        sys.exit(1)
+
+    if not os.path.isdir(args.csr_dir):
+        logging.warning(
+            f"CSR directory not found: {args.csr_dir}. Using fallback CSR list."
+        )
+
+    generator = GasTestGenerator(args.output_dir, args.csr_dir, args.inst_dir)
+
+    instructions = generator.load_instructions(
+        args.inst_dir, enabled_extensions, include_all
+    )
+
+    if not instructions:
+        logging.error("No instructions found or all were filtered out.")
+        sys.exit(1)
+
+    generator.generate_all_tests(instructions)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/backends/generators/binutils/gas_test_generator_readme.md b/backends/generators/binutils/gas_test_generator_readme.md
new file mode 100644
index 0000000000..9a88c57b7d
--- /dev/null
+++ b/backends/generators/binutils/gas_test_generator_readme.md
@@ -0,0 +1,150 @@
+# GNU Assembler Test Generator for RISC-V
+
+This tool automatically generates, binutils test files for the GNU Assembler (gas) from the RISC-V unified database. It creates assembly source files (`.s`), dump files (`.d`), and error files (`.l`) using the UDB.
+
+## Overview
+
+The generator attempts to revolutionize RISC-V extension testing by:
+
+- **Automatically discovering** extension patterns from the unified database
+- **Matching binutils conventions** with RISC-V architecture
+- **Generating realistic assembly examples** with multiple operand combinations
+- **Creating comprehensive error cases** for negative testing
+- **Eliminating manual test creation** especially for new RISC-V extensions
+
+### Generated Test Files
+
+For each extension, the generator creates:
+
+1. **Assembly Source Files (`.s`)**: Contain actual assembly instructions with various operand combinations
+2. **Dump Files (`.d`)**: Define test parameters and expected disassembly output patterns
+3. **Error Files (`.l`)**: Expected error messages for negative test cases
+4. **Architecture-specific variants**: RV32/RV64 specific tests when applicable
+
+## Usage
+
+### Basic Usage
+
+Generate tests for all instructions in the unified database:
+
+```bash
+python3 gas_test_generator.py --include-all --output-dir my_riscv_gas_tests
+```
+
+### Generate Tests for Specific Extensions
+
+```bash
+python3 gas_test_generator.py --extensions "i,m,a,f,d,zba,zbb"
+```
+
+### Custom Output Directory
+
+```bash
+python3 gas_test_generator.py --include-all --output-dir my_riscv_gas_tests
+```
+
+### Verbose Output
+
+```bash
+python3 gas_test_generator.py --include-all --output-dir my_riscv_gas_tests --verbose
+```
+
+## Command Line Options
+
+- `--inst-dir`: Directory containing instruction YAML files (default: `../../../spec/std/isa/inst/`)
+- `--csr-dir`: Directory containing CSR YAML files (default: `../../../spec/std/isa/csr/`)
+- `--output-dir`: Output directory for generated test files (default: `gas_tests`)
+- `--extensions`: Comma-separated list of enabled extensions
+- `--include-all`: Include all instructions, ignoring extension filtering
+- `--verbose`: Enable verbose logging
+
+## Integration with Binutils Test Suite
+
+The generated files follow the same format and conventions as the existing binutils gas test suite and can be directly integrated:
+
+1. Copy generated files to `binutils-gdb/gas/testsuite/gas/riscv/`
+2. Update the test Makefile if needed
+3. Run tests with `make check`
+
+## Features
+
+### Assembly Generation
+
+- **Multiple Operand Combinations**: Generates realistic assembly examples with different register and immediate combinations
+- **Constraint-Aware Generation**: Respects instruction-specific constraints from encoding definitions
+- **Edge Case Testing**: Creates boundary value tests for immediate operands
+- **Memory Operand Variants**: Handles `offset(base)` memory operands with various offsets
+- **Register Type Awareness**: Uses appropriate register names (x/a/t/s for GPR, f/fa/ft/fs for FPR)
+- **Compressed Instruction Support**: Handles C extension register constraints properly
+
+### Error Case Generation
+
+- **Invalid Registers**: Tests with out-of-range register numbers
+- **Invalid Immediates**: Tests with out-of-bounds immediate values
+- **Malformed Assembly**: Common syntax error cases
+
+### Test Organization
+
+- **Extension Grouping**: Groups related instructions by defining extension
+- **Consistent Naming**: Follows existing binutils test naming conventions
+- **Regex Patterns**: Generates robust regex patterns for disassembly matching
+
+## Architecture
+
+The generator uses a clean, modular architecture with three main components:
+
+### TestInstructionGroup
+Groups related instructions by extension and categorizes them:
+- Main instructions (architecture-neutral)
+- Compressed variants (C extension)
+- Architecture-specific instructions (RV32/RV64 only)
+- Error cases for negative testing
+
+### AssemblyExampleGenerator
+Creates realistic assembly examples using data-driven approach:
+- Loads and classifies all extensions from unified database
+- Parses assembly format strings from YAML definitions
+- Generates constraint-aware operand combinations
+- Creates realistic immediate values respecting encoding constraints
+- Handles different operand types (registers, immediates, memory, CSRs)
+- Manages compressed instruction register constraints
+
+### GasTestGenerator
+Main orchestrator implementing binutils conventions:
+- Loads instructions
+- Groups instructions by extension
+- Generates RV32-default tests matching binutils patterns
+- Creates architecture-specific variants when needed
+- Builds march strings
+- Manages binutils-compatible output directory structure
+
+## Extending the Generator
+
+### Adding New Operand Types
+
+To support new operand types, extend the `_parse_assembly_operands` method in `AssemblyExampleGenerator`:
+
+```python
+elif part == "new_operand_type":
+    operand_info["type"] = "new_type"
+```
+
+### Custom Error Cases
+
+Add extension-specific error cases by overriding `_generate_common_error_cases`:
+
+```python
+def _generate_custom_error_cases(self, group: TestInstructionGroup):
+    # Add custom error scenarios
+    group.add_error_case("instruction", "invalid_assembly", "error_message")
+```
+
+### Architecture-specific Logic
+
+Modify `_determine_march` to handle new architecture requirements:
+
+```python
+def _determine_march(self, group: TestInstructionGroup) -> str:
+    # Custom march string logic
+    return f"rv32i_{extension}"
+```
diff --git a/backends/generators/binutils/insn_class_config.py b/backends/generators/binutils/insn_class_config.py
new file mode 100644
index 0000000000..97de08c9f4
--- /dev/null
+++ b/backends/generators/binutils/insn_class_config.py
@@ -0,0 +1,13 @@
+"""
+User configuration for instruction class names.
+
+Define custom names for extensions here, e.g.:
+  USER_DEFINED_INSN_NAMES = { 'Zbb': 'INSN_CLASS_ZBB' }
+"""
+
+USER_DEFINED_INSN_NAMES = {}
+
+
+def is_user_defined_class(class_name: str) -> bool:
+    return class_name in set(USER_DEFINED_INSN_NAMES.values())
+
diff --git a/backends/generators/binutils/naming_config.py b/backends/generators/binutils/naming_config.py
new file mode 100644
index 0000000000..d4ae1f2a1d
--- /dev/null
+++ b/backends/generators/binutils/naming_config.py
@@ -0,0 +1,21 @@
+"""
+Naming configuration for binutils generation.
+
+- USER_DEFINED_INSN_NAMES: map extension identifiers (e.g., 'Zbb', 'CustomTest')
+  to explicit INSN_CLASS_* names. If not set, generator uses INSN_CLASS_<EXT>.
+
+- USER_DEFINED_OPERAND_PREFERENCES: for ambiguous or custom operands, map a
+  (operand_name, bit_range_string) pair to a binutils operand token (e.g., 'u').
+  Example: { ('imm', '31-12'): 'u' }
+"""
+
+USER_DEFINED_INSN_NAMES = {}
+
+# Example preference:
+# USER_DEFINED_OPERAND_PREFERENCES = { ('imm', '31-12'): 'u' }
+USER_DEFINED_OPERAND_PREFERENCES = {}
+
+
+def is_user_defined_class(class_name: str) -> bool:
+    return class_name in set(USER_DEFINED_INSN_NAMES.values())
+