Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions crypto/fipsmodule/ml_dsa/META.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: mldsa-native
source: pq-code-package/mldsa-native.git
branch: main
commit: b61e84f0c73d4ed612ffcaea4282a9d682de3f46
imported-at: 2026-01-16T13:12:01-0800
149 changes: 149 additions & 0 deletions crypto/fipsmodule/ml_dsa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# ML-DSA

The source code in this directory implements ML-DSA as defined in
the [FIPS 204 Module-Lattice-Based Digital Signature Standard](https://csrc.nist.gov/pubs/fips/204/final).
It is imported from [mldsa-native](https://github.com/pq-code-package/mldsa-native)
using [importer.sh](importer.sh); see [META.yml](META.yml) for import details.

## Running the importer

To re-run the importer, do

```bash
rm -rf mldsa # Remove old mldsa source
./importer.sh
```

By default, the importer will not run if [mldsa](mldsa) already/still exists. To force removal of any existing [mldsa](mldsa), use `./importer.sh --force`.

The repository and branch to be used for the import can be configured through the environment variables `GITHUB_REPOSITORY` and `GITHUB_SHA`, respectively. The default is equivalent to

```bash
GITHUB_REPOSITORY=pq-code-package/mldsa-native.git GITHUB_SHA=main ./importer.sh
```

That is, by default importer.sh will clone and install the latest [main](https://github.com/pq-code-package/mldsa-native/tree/main) of mldsa-native.

After a successful import, [META.yml](META.yml) will reflect the source, branch, commit and timestamp of the import.

### Import Scope

mldsa-native has a C-only version as well as native 'backends' in AVX2 and
Neon for high performance. At present, [importer.sh](importer.sh) imports only
the C-only version.

mldsa-native offers its own FIPS-202 implementation, including fast
versions of batched FIPS-202. [importer.sh](importer.sh) does _not_ import those.
Instead, glue-code around AWS-LC's own FIPS-202 implementation is provided in
[fips202_glue.h](fips202_glue.h) and [fips202x4_glue.h](fips202x4_glue.h).

## Configuration and compatibility layer

mldsa-native is used with a custom configuration file [mldsa_native_config.h](mldsa_native_config.h). This file includes
a compatibility layer between AWS-LC/OpenSSL and mldsa-native, covering:

* FIPS/PCT: If `AWSLC_FIPS` is set, `MLD_CONFIG_KEYGEN_PCT` is
enabled to include a PCT.
* FIPS/PCT: If `BORINGSSL_FIPS_BREAK_TESTS` is set,
`MLD_CONFIG_KEYGEN_PCT_BREAKAGE_TEST` is set and `mld_break_pct`
defined via `boringssl_fips_break_test("MLDSA_PWCT")`, to include
runtime-breakage of the PCT for testing purposes.
* CT: If `BORINGSSL_CONSTANT_TIME_VALIDATION` is set, then
`MLD_CONFIG_CT_TESTING_ENABLED` is set to enable valgrind testing.
* Zeroization: `MLD_CONFIG_CUSTOM_ZEROIZE` is set and `mld_zeroize`
mapped to `OPENSSL_cleanse` to use OpenSSL's zeroization function.
* Randombytes: `MLD_CONFIG_CUSTOM_RANDOMBYTES` is set and `mld_randombytes`
mapped to `RAND_bytes` to use AWS-LC's randombytes function.

## Build process

At the core, mldsa-native is a 'single-level' implementation of ML-DSA:
A build of the main source tree provides an implementation of
exactly one of ML-DSA-44/65/87, depending on the MLD_CONFIG_PARAMETER_SET
parameter. All source files for a single-build of mldsa-native are bundled in
[mldsa_native_bcm.c](mldsa/mldsa_native_bcm.c), which is also imported from
mldsa-native.

To build all security levels, [mldsa_native_bcm.c](mldsa/mldsa_native_bcm.c)
is included three times into [ml_dsa.c](ml_dsa.c), once per security level.
Level-independent code is included only once and shared across the levels;
this is controlled through the configuration options
`MLD_CONFIG_MULTILEVEL_WITH_SHARED` and `MLD_CONFIG_MULTILEVEL_NO_SHARED`
used prior to importing the instances of [mldsa_native_bcm.c](mldsa/mldsa_native_bcm.c) into [ml_dsa.c](ml_dsa.c).

Note that the multilevel build process is entirely internal to `ml_dsa.c`,
and does not affect the AWS-LC build otherwise.

## Formal Verification

All C-code imported by [importer.sh](importer.sh) is formally verified using the
C Bounded Model Checker ([CBMC](https://github.com/diffblue/cbmc/)) to be free of
various classes of undefined behaviour, including out-of-bounds memory accesses and
arithmetic overflow; the latter is of particular interest for ML-DSA because of
the use of lazy modular reduction for improved performance.

The heart of the CBMC proofs are function contract and loop annotations to
the C-code. Function contracts are denoted `__contract__(...)` clauses and
occur at the time of declaration, while loop contracts are denoted
`__loop__` and follow the `for` statement.

The function contract and loop statements are kept in the source, but
removed by the preprocessor so long as the CBMC macro is undefined. Keeping
them simplifies the import, and care has been taken to make them readable
to the non-expert, and thereby serve as precise documentation of
assumptions and guarantees upheld by the code.

## Testing

We test ML-DSA with Known Answer Test (KAT) vectors obtained from https://github.com/post-quantum-cryptography/KAT within `PQDSAParameterTest.KAT`. We select the KATs for the signing mode `hedged`, which derives the signing private random seed (rho) pseudorandomly from the signer's private key, the message to be signed, and a 256-bit string `rnd` which is generated at random. The `pure` variant of these KATs were used, as they provide test vector inputs for "pure" i.e., non-pre-hashed messages. The KAT files have been modified to insert linebreaks between each test vector set.

We also run the ACVP test vectors obtained from https://github.com/usnistgov/ACVP-Server within the three functions `PerMLDSATest.ACVPKeyGen`, `PerMLDSATest.ACVPSigGen` and `PerMLDSATest.ACVPSigVer`. These correspond to the tests found at [ML-DSA-keyGen-FIPS204](https://github.com/usnistgov/ACVP-Server/tree/master/gen-val/json-files/ML-DSA-keyGen-FIPS204), [ML-DSA-sigGen-FIPS204](https://github.com/usnistgov/ACVP-Server/tree/master/gen-val/json-files/ML-DSA-sigGen-FIPS204), and [ML-DSA-sigVer-FIPS204](https://github.com/usnistgov/ACVP-Server/tree/master/gen-val/json-files/ML-DSA-sigVer-FIPS204).
To test ML-DSA pure, non-deterministic mode, we use `tgId = 19, 21, 23` of sigGen and `tgId = 7, 9, 11` of sigVer.
To test ML-DSA ExternalMu, non-deterministic mode, we use `tgId = 20, 22, 24` of sigGen and `tgId = 8, 10, 12` of sigVer.

The test suite includes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it include testing the deterministic functions? Are these included in the KATs? The ACVP tests are not KATs? They test the non-deterministic version with random data?

Copy link
Contributor Author

@jakemas jakemas Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referring to the deterministic variant of ML-DSA (as opposed to deterministically testing the APIs). We do not test the deterministic mode as we do not support the ML-DSA deterministic mode. Internally the API is available (we note this here:

/* Randomized variant of ML-DSA. If you need the deterministic variant,
* call crypto_sign_signature_internal directly with all-zero rnd. */
but it isn't exposed as an external API anywhere.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referring to the deterministic variant of ML-DSA

Do you mean "non-deterministic"? If so, I'd like to know how it's tested in ACVP, with KATs or random data making sure it verifies the signature? If KATs, how are they set up to bypass randomness generation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two versions of ml-dsa, deterministic and non-deterministic, we only implement the non-deterministic variant. To test the non-deterministic variant we do it two ways, one using the ACVP test vectors, and one using the KATs. In both cases, we use the internal APIs which allow the tester to provide the randomness for each signature via the provided seed argument. So there is no need to bypass the random generation, we use the provided seeds and supply them directly the the API to get the expected known result.


* Known Answer Tests (KAT) for all three parameter sets (ML-DSA-44/65/87)
* Functional tests for key generation, signing, and verification
* ExtMu (External Mu) variant tests for pre-hash modes
* ACVP (Automated Cryptographic Validation Protocol) test vectors
* Pairwise Consistency Test (PCT) validation when FIPS mode is enabled
* Key consistency tests including public key derivation from secret key

## Side-channels

mldsa-native's CI uses a patched version of valgrind to check for various
compilers and compile flags that there are no secret-dependent memory
accesses, branches, or divisions. The relevant assertions are kept
and used if `MLD_CONFIG_CT_TESTING_ENABLED` is set, which is the case
if and only if `BORINGSSL_CONSTANT_TIME_VALIDATION` is set.

mldsa-native uses value barriers to block
potentially harmful compiler reasoning and optimization. Where standard
gcc/clang inline assembly is not available, mldsa-native falls back to a
slower 'opt blocker' based on a volatile global -- both are described in
[ct.h](https://github.com/pq-code-package/mldsa-native/blob/main/mldsa/ct.h).

## Comparison to reference implementation

mldsa-native is a fork of the ML-DSA [reference
implementation](https://github.com/pq-crystals/dilithium) (Dilithium).

The following gives an overview of the major changes:

- CBMC and debug annotations, and minor code restructurings or signature
changes to facilitate the CBMC proofs. For example, functions are structured
to make loop bounds and memory access patterns explicit for formal verification.
- Introduction of 4x-batched versions of some functions from the reference
implementation. This is to leverage 4x-batched Keccak-f1600 implementations
if present. The batching happens at the C level even if no native backend
for FIPS 202 is present.
- FIPS 204 compliance: Introduced optional PCT (FIPS 204, Section 4.4, Pairwise
Consistency) and zeroization of stack buffers as required by (FIPS 204,
Section 3.6.3, Destruction of intermediate values).
- Restructuring of files to separate level-specific from level-generic
functionality. This is needed to enable a multi-level build of mldsa-native
where level-generic code is shared between levels.
- More pervasive use of value barriers to harden constant-time primitives,
even when Link-Time-Optimization (LTO) is enabled. The use of LTO can lead
to insecure compilation in case of the reference implementation.
122 changes: 122 additions & 0 deletions crypto/fipsmodule/ml_dsa/fips202_glue.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0 OR ISC

#ifndef MLD_AWSLC_FIPS202_GLUE_H
#define MLD_AWSLC_FIPS202_GLUE_H
#include <stddef.h>
#include <stdint.h>

#include "../sha/internal.h"

#define SHAKE128_RATE 168
#define SHAKE256_RATE 136
#define SHA3_256_RATE 136
#define SHA3_512_RATE 72

#define mld_shake128ctx KECCAK1600_CTX
#define mld_shake256ctx KECCAK1600_CTX

static MLD_INLINE void mld_shake128_init(mld_shake128ctx *state) {
// Return code checks can be omitted
// SHAKE_Init always returns 1 when called with correct block size value.
(void) SHAKE_Init(state, SHAKE128_BLOCKSIZE);
}

static MLD_INLINE void mld_shake128_release(mld_shake128ctx *state) {
(void) state;
}

static MLD_INLINE void mld_shake128_absorb_once(mld_shake128ctx *state,
const uint8_t *input, size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE_Absorb(state, input, inlen);
}

static MLD_INLINE void mld_shake128_absorb(mld_shake128ctx *state,
const uint8_t *input, size_t inlen) {
(void) SHAKE_Absorb(state, input, inlen);
}

static MLD_INLINE void mld_shake128_finalize(mld_shake128ctx *state) {
// Finalization is implicit in AWS-LC's implementation
// The state is ready for squeezing after absorb
(void) state;
}

static MLD_INLINE void mld_shake128_squeeze(uint8_t *output, size_t outlen,
mld_shake128ctx *state) {
(void) SHAKE_Squeeze(output, state, outlen);
}

static MLD_INLINE void mld_shake128_squeezeblocks(uint8_t *output, size_t nblocks,
mld_shake128ctx *state) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE_Squeeze(output, state, nblocks * SHAKE128_RATE);
}

static MLD_INLINE void mld_shake256_init(mld_shake256ctx *state) {
// Return code checks can be omitted
// SHAKE_Init always returns 1 when called with correct block size value.
(void) SHAKE_Init(state, SHAKE256_BLOCKSIZE);
}

static MLD_INLINE void mld_shake256_release(mld_shake256ctx *state) {
(void) state;
}

static MLD_INLINE void mld_shake256_absorb_once(mld_shake256ctx *state,
const uint8_t *input, size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE_Absorb(state, input, inlen);
}

static MLD_INLINE void mld_shake256_absorb(mld_shake256ctx *state,
const uint8_t *input, size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE_Absorb(state, input, inlen);
}

static MLD_INLINE void mld_shake256_finalize(mld_shake256ctx *state) {
// Finalization is implicit in AWS-LC's implementation
// The state is ready for squeezing after absorb
(void) state;
}

static MLD_INLINE void mld_shake256_squeeze(uint8_t *output, size_t outlen,
mld_shake256ctx *state) {
(void) SHAKE_Squeeze(output, state, outlen);
}

static MLD_INLINE void mld_shake256_squeezeblocks(uint8_t *output, size_t nblocks,
mld_shake256ctx *state) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE_Squeeze(output, state, nblocks * SHAKE256_RATE);
}

static MLD_INLINE void mld_shake256(uint8_t *output, size_t outlen,
const uint8_t *input, size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE256(input, inlen, output, outlen);
}

static MLD_INLINE void mld_sha3_256(uint8_t *output, const uint8_t *input,
size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHA3_256(input, inlen, output);
}

static MLD_INLINE void mld_sha3_512(uint8_t *output, const uint8_t *input,
size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHA3_512(input, inlen, output);
}

#endif // MLD_AWSLC_FIPS202_GLUE_H
106 changes: 106 additions & 0 deletions crypto/fipsmodule/ml_dsa/fips202x4_glue.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0 OR ISC

//
// This is a shim establishing the FIPS-202 API required by
// mldsa-native from the API exposed by AWS-LC.
//

#ifndef MLD_AWSLC_FIPS202X4_GLUE_H
#define MLD_AWSLC_FIPS202X4_GLUE_H

#include <stddef.h>
#include <stdint.h>

#include "fips202_glue.h"

// Use AWS-LC's existing KECCAK1600_CTX_x4 structure for SHAKE128
#define mld_shake128x4ctx KECCAK1600_CTX_x4

// For SHAKE256 x4, we need a custom structure since AWS-LC only has batched SHAKE128
typedef struct mld_shake256x4ctx_s {
KECCAK1600_CTX s[4];
} mld_shake256x4ctx;

static MLD_INLINE void mld_shake128x4_absorb_once(mld_shake128x4ctx *state,
const uint8_t *in0,
const uint8_t *in1,
const uint8_t *in2,
const uint8_t *in3, size_t inlen) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE128_Absorb_once_x4(state, in0, in1, in2, in3, inlen);
}

static MLD_INLINE void mld_shake128x4_squeezeblocks(uint8_t *out0, uint8_t *out1,
uint8_t *out2, uint8_t *out3,
size_t nblocks,
mld_shake128x4ctx *state) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE128_Squeezeblocks_x4(out0, out1, out2, out3, state, nblocks);
}

static MLD_INLINE void mld_shake128x4_init(mld_shake128x4ctx *state) {
// Return code check can be omitted
// since mldsa-native adheres to call discipline
(void) SHAKE128_Init_x4(state);
}

static MLD_INLINE void mld_shake128x4_release(mld_shake128x4ctx *state) {
(void) state;
}

// AWS-LC doesn't have SHAKE256 x4 batched operations like it does for SHAKE128
// We provide serial implementations that process each instance separately
static MLD_INLINE void mld_shake256x4_absorb_once(mld_shake256x4ctx *state,
const uint8_t *in0,
const uint8_t *in1,
const uint8_t *in2,
const uint8_t *in3, size_t inlen) {
// Process four independent SHAKE256 operations serially
mld_shake256_init(&state->s[0]);
mld_shake256_absorb_once(&state->s[0], in0, inlen);
mld_shake256_init(&state->s[1]);
mld_shake256_absorb_once(&state->s[1], in1, inlen);
mld_shake256_init(&state->s[2]);
mld_shake256_absorb_once(&state->s[2], in2, inlen);
mld_shake256_init(&state->s[3]);
mld_shake256_absorb_once(&state->s[3], in3, inlen);
}

static MLD_INLINE void mld_shake256x4_squeezeblocks(uint8_t *out0, uint8_t *out1,
uint8_t *out2, uint8_t *out3,
size_t nblocks,
mld_shake256x4ctx *state) {
// Process four independent squeeze operations serially
mld_shake256_squeezeblocks(out0, nblocks, &state->s[0]);
mld_shake256_squeezeblocks(out1, nblocks, &state->s[1]);
mld_shake256_squeezeblocks(out2, nblocks, &state->s[2]);
mld_shake256_squeezeblocks(out3, nblocks, &state->s[3]);
}

static MLD_INLINE void mld_shake256x4_init(mld_shake256x4ctx *state) {
// Initialize four independent states
mld_shake256_init(&state->s[0]);
mld_shake256_init(&state->s[1]);
mld_shake256_init(&state->s[2]);
mld_shake256_init(&state->s[3]);
}

static MLD_INLINE void mld_shake256x4_release(mld_shake256x4ctx *state) {
(void) state;
}

static MLD_INLINE void mld_shake256x4(uint8_t *out0, uint8_t *out1, uint8_t *out2,
uint8_t *out3, size_t outlen, uint8_t *in0,
uint8_t *in1, uint8_t *in2, uint8_t *in3,
size_t inlen) {
// Process four independent SHAKE256 operations serially
mld_shake256(out0, outlen, in0, inlen);
mld_shake256(out1, outlen, in1, inlen);
mld_shake256(out2, outlen, in2, inlen);
mld_shake256(out3, outlen, in3, inlen);
}

#endif // MLD_AWSLC_FIPS202X4_GLUE_H
Loading
Loading