Proposal to add ETC.a architecture support #2813

alex-s168 · 2025-10-31T23:30:44Z

Proposal to add ETC.a architecture support https://github.com/ETC-A/etca-spec/

This is still work in progress; not all extensions are implemented yet, and tests are still missing

Since ETC.a is not that popular (yet), I'd understand if you (maintainers) prefer this staying a fork.

If this will however be merged, I will do bug fixes and refactorings in the code;

My plan is to use this for an emulator

Rot127

This looks like a great start!

Having ETC.a as a new module would be really nice.
I added several suggestions.
Especially getting rid of the magic numbers is really important. Please remove them with enums, static inlines, and helper functions.

Two other things are a requirement for getting this merged:

Testing

We increased the test coverage a lot, and I really want to stick with this trend. Since your module is rather small I would require a test coverage of near 100% (with the exceptions of NULL error checks and similar hard to test things).

This means:

you would need to add test files in tests/MC and tests/detail for assembly testing and details.
Add the architecture to /suite/cstest and bindings/python/cstest_py.

Features

I briefly skimmed over the ISA, and it looks like the have a very small base ISA and many many extensions.
We recently added way better support for enabling ISA extensions. For this module I think it is important to support toggling them as well.

We call extensions features here (as LLVM does), but it works the same. If a feature/extension is enabled, certain instructions decompile or not.
Checkout the cs_mode enum and how it is used to test for enabled features.

If you find it too much work it is fine of course. We should add your plugin to a list of notable forks then, so people can find it easier.

arch/Etca/EtcaDisassembler.c

Rot127 · 2025-11-01T10:08:57Z

arch/Etca/EtcaDisassembler.c

+static bool parseCoreOp(DecodeIsntCtx *ctx, const uint8_t **code_p,
+			size_t *code_len_p, uint16_t *size)


Please use uint16, uint32, or uint64 to pass the instruction bytes. Not an array. It allows you to perform bit operations on it, which is much faster and easier to read as well.

There are helper functions to read from a buffer into an unsigned integer. See readBytes in utils.c.

It also gets read of the operations on code_len

Rot127 · 2025-11-01T10:12:07Z

arch/Etca/EtcaDisassembler.c

+		code += 3;
+		code_len -= 3;
+		(*size) += 3;
+	} else if (code_len >= 2 && code[0] == 0x2F && code[1] == 0x11) {


Regarding this code[0] == 0x2F && code[1] == 0x11 code pattern.

Please use masks and bit operations instead. E.g.: match_mask(bytes, opcode_mask, expected_bits) where match_mask in this case would do:

return bytes & some_opcode_mask == expected_bits

Using masks and such also gets rid of all the magic numbers. Since you can just make an enum with them.

hmm can I do something like

#define b4(a,b,c,d) ((a<<3)|(b<<2)|(c<<1)|d) #define b(a,b,c,d, e,f,g,h) ((b4(a,b,c,d) << 4) | b4(e,f,g,h))

Works as well. Better are inline functions though. Because they can be debugged, in contrast to macros.
But in any case, please name the macros descriptively. So field_b instead of b

I could also do this to potentially make the code more readable

#define match1_0(x) ((x) == 0) #define match1_1(x) ((x) != 0) #define match1_x(x) (1) #define match1__(x) (1) #define match1(x, a) match1_##a(x) // example: // if (match8(byte0, 0,0,1,x,0,x,x,x,x)) {} #define match4_impl(x, a, b, c, d) \ (match1(((x) >> 3) & 1, a) && match1(((x) >> 2) & 1, b) && \ match1(((x) >> 1) & 1, c) && match1((x) & 1, d)) #define match4(x, ...) match4_impl(x, __VA_ARGS__) #define match8_impl(x, a, b, c, d, e, f, g, h) \ (match4(((x) >> 4) & 0xF, a, b, c, d) && match4((x) & 0xF, e, f, g, h)) #define match8(x, ...) match8_impl(x, __VA_ARGS__) #define extract1_0(out, x) /* */ #define extract1_1(out, x) /* */ #define extract1__(out, x) /* */ #define extract1_x(out, x) out = (out << 1) | ((x) & 1) #define extract1(out, x, a) extract1_##a(out, x) #define extract4_impl(out, x, a, b, c, d) \ do { \ extract1(out, (x) >> 3, a); \ extract1(out, (x) >> 2, b); \ extract1(out, (x) >> 1, c); \ extract1(out, (x), d); \ } while (0) #define extract4(out, x, ...) extract4_impl(out, x, __VA_ARGS__) #define extract8_impl(out, x, a, b, c, d, e, f, g, h) \ do { \ extract1(out, (x) >> 7, a); \ extract1(out, (x) >> 6, b); \ extract1(out, (x) >> 5, c); \ extract1(out, (x) >> 4, d); \ extract1(out, (x) >> 3, e); \ extract1(out, (x) >> 2, f); \ extract1(out, (x) >> 1, g); \ extract1(out, (x), h); \ } while (0) #define extract8(out, x, ...) extract8_impl(out, x, __VA_ARGS__)

with that, the code would look like this:

if (code_len >= 3 && match8(code[0], 1, 1, 1, 0, _, _, _, _)) { uint16_t opc = 0; extract8(opc, code[0], _, _, _, _, x, x, x, x); extract8(opc, code[1], x, _, _, _, x, x, x, x); ctx->insn = parseExopOpcode(opc); if (ctx->insn == ETCA_INS_INVALID) return false; ctx->ss = 0; extract8(ctx->ss, code[1], _, _, x, x, _, _, _, _); if (match8(code[1], _, 1, _, _, _, _, _, _)) { parseRI(ctx, code[2], ctx->insn); } else { parseABM(ctx, code[2]); }

But we could also do this:

#define EXOP_B0(C, I, S) 1, 1, 1, 0, C, C, C, C #define EXOP_B1(C, I, S) C, I, S, S, C, C, C, C if (code_len >= 3 && match8(code[0], EXOP_B0(_, _, _))) { uint16_t opc = 0; extract8(opc, code[0], EXOP_B0(x, _, _)); extract8(opc, code[1], EXOP_B1(x, _, _)); ctx->insn = parseExopOpcode(opc); if (ctx->insn == ETCA_INS_INVALID) return false; ctx->ss = 0; extract8(ctx->ss, code[1], EXOP_B1(_, _, x)); if (match8(code[1], EXOP_B1(_, 1, _))) { parseRI(ctx, code[2], ctx->insn); } else { parseABM(ctx, code[2]); }

or even:

#define EXOP_B0_(C, I, S) 1, 1, 1, 0, C, C, C, C #define EXOP_B1_(C, I, S) C, I, S, S, C, C, C, C #define EXOP_B0 EXOP_B0_(_, _, _) #define EXOP_B1 EXOP_B1_(_, _, _) #define EXOP_B0_OPCODE EXOP_B0_(x, _, _) #define EXOP_B1_OPCODE EXOP_B1_(x, _, _) #define EXOP_B1_ISIMM EXOP_B1_(_, x, _) #define EXOP_B1_SS EXOP_B1_(_, _, x) if (code_len >= 3 && match8(code[0], EXOP_B0)) { uint16_t opc = 0; extract8(opc, code[0], EXOP_B0_OPC); extract8(opc, code[1], EXOP_B1_OPC); ctx->insn = parseExopOpcode(opc); if (ctx->insn == ETCA_INS_INVALID) return false; ctx->ss = 0; extract8(ctx->ss, code[1], EXOP_B1_SS); if (match8(code[1], EXOP_B1_ISIMM)) { parseRI(ctx, code[2], ctx->insn); } else { parseABM(ctx, code[2]); }

or, to prevent (unlikely) future errors:

#define EXOP_B0_(C, I, S) 1, 1, 1, 0, C, C, C, C #define EXOP_B1_(C, I, S) C, I, S, S, C, C, C, C #define EXOP_B0 EXOP_B0_(_, _, _) #define EXOP_B0_OPCODE EXOP_B0_(x, _, _) #define EXOP_B0_ISIMM EXOP_B0_(_, x, _) #define EXOP_B0_SS EXOP_B_(_, _, x) #define EXOP_B1 EXOP_B1_(_, _, _) #define EXOP_B1_OPCODE EXOP_B1_(x, _, _) #define EXOP_B1_ISIMM EXOP_B1_(_, x, _) #define EXOP_B1_SS EXOP_B1_(_, _, x) if (code_len >= 3 && match8(code[0], EXOP_B0)) { uint16_t opc = 0; extract8(opc, code[0], EXOP_B0_OPC); extract8(opc, code[1], EXOP_B1_OPC); ctx->insn = parseExopOpcode(opc); if (ctx->insn == ETCA_INS_INVALID) return false; ctx->ss = 0; extract8(ctx->ss, code[0], EXOP_B0_SS); extract8(ctx->ss, code[1], EXOP_B1_SS); if (match8(code[0], EXOP_B0_ISIMM) && match8(code[1], EXOP_B1_ISIMM)) { parseRI(ctx, code[2], ctx->insn); } else { parseABM(ctx, code[2]); }

I was thinking more about this:

Assume you pass the instruction bytes as uint as described here. And let's say the least significant bits are the first byte in the ISA:

typedef enum { ETCA_MASK_FORMAT = 0xc0, ///< Mask for the format marker. ETCA_MASK_BASE_OPCODE = 0x0f, ///< Opcode field for base instructions. Identified with CCCC in the ISA. ETCA_MASK_BASE_IMM = 0x1f00, ///< The immediate field IIIII of a base instruction with an immediate operand. ETCA_MASK_BASE_REG_A = 0xe000, ///< The register A field of a base instruction. ETCA_MASK_BASE_REG_B = 0x1c00, ///< The register B field of a base instruction. ETCA_MASK_BASE_DISP = 0xff00, ///< The Displacement field of a base instruction. } ETCAInsnFormatMask; typedef enum { ETCA_REG_REG_OP = 0x00, ETCA_REG_IMM_OP = 0x40, ETCA_COND_JMP_OP = 0x80, ETCA_VAR_LEN_OP = 0xc0, ///< Instruction is defined in extensions. } ETCAInsnFormat; // In some function decode_insn (uint32_t insn_bytes) { switch (insn_bytes & ETCA_MASK_FORMAT) { case ETCA_REG_REG_OP: return handle_base_reg_reg_op(insn_bytes); case ETCA_REG_IMM_OP: return handle_base_reg_imm_op(insn_bytes); case ETCA_COND_JMP_OP: return handle_base_cond_jmp_op(insn_bytes); case ETCA_VAR_LEN_OP: // All extension instructions. return handle_base_var_len_op(insn_bytes); }

Then in the handlers you check all the other fields (like the opcode fields) with similar switch statements, until you have identified and decoded an instruction.

extract4, extract8 etc.

We have the function get_insn_field in MathExtras.h for it. Feel free to extend it with other versions.

Are these the correct modes for etca?

CS_MODE_ETCA16 = (CS_MODE_LITTLE_ENDIAN | CS_MODE_16), CS_MODE_ETCA32 = (CS_MODE_LITTLE_ENDIAN | CS_MODE_32), CS_MODE_ETCA64 = (CS_MODE_LITTLE_ENDIAN | CS_MODE_64),

You can use the CS_MODE_16/32/64 and CS_MODE_LITTLE/BIG_ENDIAN as is. But of course creating alias like that.

But all the extensions need their own mode.

You can use all bits from bit 4 onward. So I would suggest to use two bits to signal the CUID.
But there are more than 28 possible extensions (so they won't fit in the 28bits the feature flags have).

So according to the table (https://github.com/ETC-A/etca-spec/tree/main/extensions#extensions) I suggest:

CS_MODE_ECTA_CPUID_1 = 1 << 4, ///< Bits for all mode flags for CPUID 1 CS_MODE_ECTA_CPUID_2 = 2 << 4, ///< Bits for all mode flags for CPUID 2 // Bits 6 onward are for the extension flags CS_MODE_ECTA_FI = 1 << 6 | CS_MODE_ECTA_CPUID_1, CS_MODE_ECTA_SAF = 1 << 7 | CS_MODE_ECTA_CPUID_1, CS_MODE_ECTA_INT = 1 << 8 | CS_MODE_ECTA_CPUID_1, ... CS_MODE_ECTA_DW = CS_MODE_32 | CS_MODE_ECTA_CPUID_1, ///< CS_MODE_ECTA_DW and CS_MODE_32 are basically the same. So reuse. CS_MODE_ECTA_QW = CS_MODE_64 | CS_MODE_ECTA_CPUID_1, ///< Same here CS_MODE_ECTA_DWAS = 1 << n, ... CS_MODE_ECTA_EXOP = 1 << 7 | CS_MODE_ECTA_CPUID_2,

arch/Etca/EtcaInstPrinter.c

arch/Etca/EtcaModule.c

cs.c

Makefile

include/capstone/etca.h

alex-s168 added 3 commits October 31, 2025 21:19

Full ETC.a support, except for MO1 and MO2

734f480

improve printing of jmp & call ops

0b96b04

improve printing of jmp & call ops again

e058ad6

github-actions bot added CS-core-files auto-sync Github-files Github related files labels Oct 31, 2025

Rot127 requested changes Nov 1, 2025

View reviewed changes

alex-s168 added 5 commits November 1, 2025 15:29

wip mo2 & other changes

74a1652

c

0f2f49b

clean up inst printer a bit

2b0a93e

remove etca from makefile

53db5dd

reformat with clang 17

750c8c6

		static bool parseCoreOp(DecodeIsntCtx ctx, const uint8_t *code_p,
		size_t code_len_p, uint16_t size)

Proposal to add ETC.a architecture support #2813

Are you sure you want to change the base?

Proposal to add ETC.a architecture support #2813

Conversation

alex-s168 commented Oct 31, 2025

Uh oh!

Rot127 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-s168 Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-s168 Nov 3, 2025 •

edited

Loading

Rot127 Nov 3, 2025 •

edited

Loading

Rot127 Nov 4, 2025 •

edited

Loading