Skip to content

Conversation

@alex-s168
Copy link

Proposal to add ETC.a architecture support https://github.com/ETC-A/etca-spec/

This is still work in progress; not all extensions are implemented yet, and tests are still missing

Since ETC.a is not that popular (yet), I'd understand if you (maintainers) prefer this staying a fork.

If this will however be merged, I will do bug fixes and refactorings in the code;

My plan is to use this for an emulator

@github-actions github-actions bot added CS-core-files auto-sync Github-files Github related files labels Oct 31, 2025
Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start!

Having ETC.a as a new module would be really nice.
I added several suggestions.
Especially getting rid of the magic numbers is really important. Please remove them with enums, static inlines, and helper functions.

Two other things are a requirement for getting this merged:

Testing

We increased the test coverage a lot, and I really want to stick with this trend. Since your module is rather small I would require a test coverage of near 100% (with the exceptions of NULL error checks and similar hard to test things).

This means:

  • you would need to add test files in tests/MC and tests/detail for assembly testing and details.
  • Add the architecture to /suite/cstest and bindings/python/cstest_py.

Features

I briefly skimmed over the ISA, and it looks like the have a very small base ISA and many many extensions.
We recently added way better support for enabling ISA extensions. For this module I think it is important to support toggling them as well.

We call extensions features here (as LLVM does), but it works the same. If a feature/extension is enabled, certain instructions decompile or not.
Checkout the cs_mode enum and how it is used to test for enabled features.

If you find it too much work it is fine of course. We should add your plugin to a list of notable forks then, so people can find it easier.

Comment on lines +251 to +252
static bool parseCoreOp(DecodeIsntCtx *ctx, const uint8_t **code_p,
size_t *code_len_p, uint16_t *size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use uint16, uint32, or uint64 to pass the instruction bytes. Not an array. It allows you to perform bit operations on it, which is much faster and easier to read as well.

There are helper functions to read from a buffer into an unsigned integer. See readBytes in utils.c.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also gets read of the operations on code_len

code += 3;
code_len -= 3;
(*size) += 3;
} else if (code_len >= 2 && code[0] == 0x2F && code[1] == 0x11) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this code[0] == 0x2F && code[1] == 0x11 code pattern.

Please use masks and bit operations instead. E.g.: match_mask(bytes, opcode_mask, expected_bits) where match_mask in this case would do:

return bytes & some_opcode_mask == expected_bits

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using masks and such also gets rid of all the magic numbers. Since you can just make an enum with them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm can I do something like

#define b4(a,b,c,d) ((a<<3)|(b<<2)|(c<<1)|d)
#define b(a,b,c,d, e,f,g,h) ((b4(a,b,c,d) << 4) | b4(e,f,g,h))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as well. Better are inline functions though. Because they can be debugged, in contrast to macros.
But in any case, please name the macros descriptively. So field_b instead of b

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also do this to potentially make the code more readable

#define match1_0(x) ((x) == 0)
#define match1_1(x) ((x) != 0)
#define match1_x(x) (1)
#define match1__(x) (1)
#define match1(x, a) match1_##a(x)

// example:
//   if (match8(byte0, 0,0,1,x,0,x,x,x,x)) {}
#define match4_impl(x, a, b, c, d) \
	(match1(((x) >> 3) & 1, a) && match1(((x) >> 2) & 1, b) && \
	 match1(((x) >> 1) & 1, c) && match1((x) & 1, d))
#define match4(x, ...) match4_impl(x, __VA_ARGS__)
#define match8_impl(x, a, b, c, d, e, f, g, h) \
	(match4(((x) >> 4) & 0xF, a, b, c, d) && match4((x) & 0xF, e, f, g, h))
#define match8(x, ...) match8_impl(x, __VA_ARGS__)

#define extract1_0(out, x) /* */
#define extract1_1(out, x) /* */
#define extract1__(out, x) /* */
#define extract1_x(out, x) out = (out << 1) | ((x) & 1)
#define extract1(out, x, a) extract1_##a(out, x)

#define extract4_impl(out, x, a, b, c, d) \
	do { \
		extract1(out, (x) >> 3, a); \
		extract1(out, (x) >> 2, b); \
		extract1(out, (x) >> 1, c); \
		extract1(out, (x), d); \
	} while (0)
#define extract4(out, x, ...) extract4_impl(out, x, __VA_ARGS__)

#define extract8_impl(out, x, a, b, c, d, e, f, g, h) \
	do { \
		extract1(out, (x) >> 7, a); \
		extract1(out, (x) >> 6, b); \
		extract1(out, (x) >> 5, c); \
		extract1(out, (x) >> 4, d); \
		extract1(out, (x) >> 3, e); \
		extract1(out, (x) >> 2, f); \
		extract1(out, (x) >> 1, g); \
		extract1(out, (x), h); \
	} while (0)
#define extract8(out, x, ...) extract8_impl(out, x, __VA_ARGS__)

Copy link
Author

@alex-s168 alex-s168 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that, the code would look like this:

if (code_len >= 3 && match8(code[0], 1, 1, 1, 0, _, _, _, _)) {
  uint16_t opc = 0;
  extract8(opc, code[0], _, _, _, _, x, x, x, x);
  extract8(opc, code[1], x, _, _, _, x, x, x, x);

  ctx->insn = parseExopOpcode(opc);
  if (ctx->insn == ETCA_INS_INVALID)
    return false;

  ctx->ss = 0;
  extract8(ctx->ss, code[1], _, _, x, x, _, _, _, _);

  if (match8(code[1], _, 1, _, _, _, _, _, _)) {
    parseRI(ctx, code[2], ctx->insn);
  } else {
    parseABM(ctx, code[2]);
  }

But we could also do this:

#define EXOP_B0(C, I, S) 1, 1, 1, 0, C, C, C, C
#define EXOP_B1(C, I, S) C, I, S, S, C, C, C, C

if (code_len >= 3 && match8(code[0], EXOP_B0(_, _, _))) {
  uint16_t opc = 0;
  extract8(opc, code[0], EXOP_B0(x, _, _));
  extract8(opc, code[1], EXOP_B1(x, _, _));

  ctx->insn = parseExopOpcode(opc);
  if (ctx->insn == ETCA_INS_INVALID)
    return false;

  ctx->ss = 0;
  extract8(ctx->ss, code[1], EXOP_B1(_, _, x));

  if (match8(code[1], EXOP_B1(_, 1, _))) {
    parseRI(ctx, code[2], ctx->insn);
  } else {
    parseABM(ctx, code[2]);
  }

or even:

#define EXOP_B0_(C, I, S) 1, 1, 1, 0, C, C, C, C
#define EXOP_B1_(C, I, S) C, I, S, S, C, C, C, C

#define EXOP_B0 EXOP_B0_(_, _, _)
#define EXOP_B1 EXOP_B1_(_, _, _)
#define EXOP_B0_OPCODE  EXOP_B0_(x, _, _)
#define EXOP_B1_OPCODE EXOP_B1_(x, _, _)
#define EXOP_B1_ISIMM EXOP_B1_(_, x, _)
#define EXOP_B1_SS EXOP_B1_(_, _, x)

if (code_len >= 3 && match8(code[0], EXOP_B0)) {
  uint16_t opc = 0;
  extract8(opc, code[0], EXOP_B0_OPC);
  extract8(opc, code[1], EXOP_B1_OPC);

  ctx->insn = parseExopOpcode(opc);
  if (ctx->insn == ETCA_INS_INVALID)
    return false;

  ctx->ss = 0;
  extract8(ctx->ss, code[1], EXOP_B1_SS);

  if (match8(code[1], EXOP_B1_ISIMM)) {
    parseRI(ctx, code[2], ctx->insn);
  } else {
    parseABM(ctx, code[2]);
  }

or, to prevent (unlikely) future errors:

#define EXOP_B0_(C, I, S) 1, 1, 1, 0, C, C, C, C
#define EXOP_B1_(C, I, S) C, I, S, S, C, C, C, C

#define EXOP_B0 EXOP_B0_(_, _, _)
#define EXOP_B0_OPCODE  EXOP_B0_(x, _, _)
#define EXOP_B0_ISIMM EXOP_B0_(_, x, _)
#define EXOP_B0_SS EXOP_B_(_, _, x)

#define EXOP_B1 EXOP_B1_(_, _, _)
#define EXOP_B1_OPCODE EXOP_B1_(x, _, _)
#define EXOP_B1_ISIMM EXOP_B1_(_, x, _)
#define EXOP_B1_SS EXOP_B1_(_, _, x)

if (code_len >= 3 && match8(code[0], EXOP_B0)) {
  uint16_t opc = 0;
  extract8(opc, code[0], EXOP_B0_OPC);
  extract8(opc, code[1], EXOP_B1_OPC);

  ctx->insn = parseExopOpcode(opc);
  if (ctx->insn == ETCA_INS_INVALID)
    return false;

  ctx->ss = 0;
  extract8(ctx->ss, code[0], EXOP_B0_SS);
  extract8(ctx->ss, code[1], EXOP_B1_SS);

  if (match8(code[0], EXOP_B0_ISIMM) && match8(code[1], EXOP_B1_ISIMM)) {
    parseRI(ctx, code[2], ctx->insn);
  } else {
    parseABM(ctx, code[2]);
  }

Copy link
Collaborator

@Rot127 Rot127 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about this:

Assume you pass the instruction bytes as uint as described here. And let's say the least significant bits are the first byte in the ISA:

typedef enum {
  ETCA_MASK_FORMAT = 0xc0, ///< Mask for the format marker.
  ETCA_MASK_BASE_OPCODE = 0x0f, ///< Opcode field for base instructions. Identified with CCCC in the ISA.
  ETCA_MASK_BASE_IMM = 0x1f00, ///< The immediate field IIIII of a base instruction with an immediate operand.
  ETCA_MASK_BASE_REG_A = 0xe000, ///< The register A field of a base instruction.
  ETCA_MASK_BASE_REG_B = 0x1c00, ///< The register B field of a base instruction.
  ETCA_MASK_BASE_DISP = 0xff00, ///< The Displacement field of a base instruction.
} ETCAInsnFormatMask;

typedef enum {
  ETCA_REG_REG_OP = 0x00,
  ETCA_REG_IMM_OP = 0x40,
  ETCA_COND_JMP_OP = 0x80,
  ETCA_VAR_LEN_OP = 0xc0, ///< Instruction is defined in extensions.
} ETCAInsnFormat;

// In some function
decode_insn (uint32_t insn_bytes) {
  switch (insn_bytes & ETCA_MASK_FORMAT) {
  case ETCA_REG_REG_OP:
    return handle_base_reg_reg_op(insn_bytes);
  case ETCA_REG_IMM_OP:
    return handle_base_reg_imm_op(insn_bytes);
  case ETCA_COND_JMP_OP:
    return handle_base_cond_jmp_op(insn_bytes);
  case ETCA_VAR_LEN_OP:
    // All extension instructions.
    return handle_base_var_len_op(insn_bytes);
}

Then in the handlers you check all the other fields (like the opcode fields) with similar switch statements, until you have identified and decoded an instruction.

extract4, extract8 etc.

We have the function get_insn_field in MathExtras.h for it. Feel free to extend it with other versions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the correct modes for etca?

	CS_MODE_ETCA16 = (CS_MODE_LITTLE_ENDIAN | CS_MODE_16),
	CS_MODE_ETCA32 = (CS_MODE_LITTLE_ENDIAN | CS_MODE_32),
	CS_MODE_ETCA64 = (CS_MODE_LITTLE_ENDIAN | CS_MODE_64),

Copy link
Collaborator

@Rot127 Rot127 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the CS_MODE_16/32/64 and CS_MODE_LITTLE/BIG_ENDIAN as is. But of course creating alias like that.

But all the extensions need their own mode.

You can use all bits from bit 4 onward. So I would suggest to use two bits to signal the CUID.
But there are more than 28 possible extensions (so they won't fit in the 28bits the feature flags have).

So according to the table (https://github.com/ETC-A/etca-spec/tree/main/extensions#extensions) I suggest:

CS_MODE_ECTA_CPUID_1 = 1 << 4, ///< Bits for all mode flags for CPUID 1
CS_MODE_ECTA_CPUID_2 = 2 << 4, ///< Bits for all mode flags for CPUID 2
// Bits 6 onward are for the extension flags
CS_MODE_ECTA_FI = 1 << 6 | CS_MODE_ECTA_CPUID_1,
CS_MODE_ECTA_SAF = 1 << 7 | CS_MODE_ECTA_CPUID_1,
CS_MODE_ECTA_INT = 1 << 8 | CS_MODE_ECTA_CPUID_1,
...
CS_MODE_ECTA_DW = CS_MODE_32 | CS_MODE_ECTA_CPUID_1, ///< CS_MODE_ECTA_DW and CS_MODE_32 are basically the same. So reuse.
CS_MODE_ECTA_QW = CS_MODE_64 | CS_MODE_ECTA_CPUID_1, ///< Same here
CS_MODE_ECTA_DWAS = 1 << n,
...
CS_MODE_ECTA_EXOP = 1 << 7 | CS_MODE_ECTA_CPUID_2,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CS-core-files auto-sync Github-files Github related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants