Skip to content

Conversation

nstilt1
Copy link
Contributor

@nstilt1 nstilt1 commented Aug 14, 2025

This started as edits for rng.rs, then switched to show how only using 64-bit counters would work and the issues they would cause (only problems existed with ChaChaCoreremaining_blocks() was preventing issues with ChaCha20 Ietf variants). Then generic counters were green-lit and here we are.

Brief rundown of the changes:

  • All backends use a 64-bit counter and extract state[12] and state[13] (except neon.rs which extracts the whole 4th row)
  • ChaCha20 uses Ietf variant, ChaCha20Legacy uses Legacy variant, ChaCha20Rng uses Legacy variant, and XChaCha uses Ietf variant. We can change XChaCha to Legacy if desired
  • remaining_blocks() prevents IETF variant from overflowing (not exactly a change)
  • cipher backends use either a 32-bit counter or a 64-bit counter based on the Variant::Counter.
  • rng backends use a 64-bit counter.

Brief rundown of the alternative changes if only using a 64-bit counter (if remaining_blocks() was adjusted to count the total available blocks):

  1. Protect state[13] from overflowing (this is only mandatory if the final block of the keystream can be consumed by the IETF Variant. It is also mandatory if we wish for ChaChaCore to not have a bug where after exhausting the keystream, it is unable to return to block_pos == 0)
    • soft.rs uses either a 32-bit counter or a 64-bit counter based on Variant
    • SIMD backends all use 64-bit counter addition, but only updates state[12] when variant is Ietf, preventing state[13] from overflowing after reaching the end of the keystream. This works because the cipher's counter is not meant to wrap, so the extra generated blocks beyond block u32::MAX would not be used by the ciphers. If we wanted the cipher to wrap, we would have a problem with this implementation.
  2. remaining_blocks could be updated to allow use of the final block of the keystream

Known bugs:

  • ChaCha20 IETF cipher can't use the final block of the keystream. See chacha20: Can't use full keystream of ChaCha20 IETF variant #444. It can be fixed in chacha20's remaining_blocks() but it would make the cipher usable after exhausting the keystream. It seems that the only way to fix it is to keep track of a flag/boolean about whether the counter has wrapped or the keystream has been exhausted, or about whether it is the first block or not.
  • ChaCha20Legacy could wrap on a 32-bit machine if there were 2^32 remaining_blocks() and the user tried apply_keystream() on 2^32 + 1 blocks. However, that would eat up 256 GiB of RAM and 32-bit machines are limited to 4 GiB of RAM.
  • Update paragraph marked TODO
  • Create new generate function that updates the 13th word in ChaCha's state
  • impl ChaChaXLegacyRng and ChaChaXLegacyCore
  • undo the previous step and impl one 64 bit RNG
  • Restrict block pos to be a multiple of 4 so that the generate function can operate smoothly with our backends, and without changing the backends
  • Organize code into macros to prevent duplication
  • Update getters and setters to allow for 64 bit block pos/counter and 64 bit stream ID
  • Use only one 64-bit counter RNG instead of 2 RNG variants
  • Add tests. Try to break stuff.
  • 64-bit counters in soft.rs
  • 64-bit counters in avx2.rs
  • 64-bit counters in sse2.rs
  • 64-bit counters in neon.rs
  • fix failing tests post implementation such as serde roundtrip
  • update docs
  • add 64-bit counter test vectors

closes #334

@nstilt1 nstilt1 changed the title added newtype that should be able to simulate a 64-bit counter chacha20: added newtype that should be able to simulate a 64-bit counter Aug 14, 2025
@nstilt1 nstilt1 changed the title chacha20: added newtype that should be able to simulate a 64-bit counter chacha20: added newtype to simulate a 64-bit counter Aug 14, 2025
@dhardy
Copy link
Contributor

dhardy commented Aug 15, 2025

In my opinion the counter increment should happen in the backends. Ensuring that the counter is a multiple of 4 is a good idea however, and should help limit the complexity required.

(In part this depends on whether ChaCha20Legacy cipher should support > 256 MiB data.)

Regarding the names, I would prefer only one variant (64-bit counter + 64-bit stream id) be exposed as an RNG. I'm less sure about the ciphers; maybe those don't need to change except to increase the counter size for the legacy and xchacha variants.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 15, 2025

@dhardy

In my opinion the counter increment should happen in the backends. Ensuring that the counter is a multiple of 4 is a good idea however, and should help limit the complexity required.

I was just going off of the info in #334. I proposed two other options in there:

  1. solely use 64-bit counters in the backends and limiting the IETF variants to a keystream with a 32-bit counter by panicking. If an RNG were to exist with a 32-bit counter, I could do something similar with this PR to achieve that.
  2. using generics to track the variant and its counter size. I have a branch with this implemented already. Would just need to copy over the code to a new branch.

Both of those options were implicitly "rejected" (they weren't explicitly rejected) in favor of creating a "newtype" that increments the upper part of the counter when needed. I don't know exactly what @tarcieri meant by that, but this PR is my interpretation of that, and it should function accordingly. Maybe it was supposed to involve more code in the backends. Do you want me to continue this PR, or revert to my other branch with generics, or make a new branch that only uses 64-bit counters in the backend? Or should I be adding code to the backends to implement the "newtype"?

I think option 1 is the simplest option out of all of them. All of the backends will behave the same, with no duplicated code as a result of supporting different counters, and no additional branches/if statements in the code.

@tarcieri
Copy link
Member

Really the main thing I'd like to prevent is having to make every backend generic over both a 32-bit and 64-bit counter. Using a 64-bit counter is another way to achieve that.

I think option 1 is the simplest option out of all of them. All of the backends will behave the same, with no duplicated code as a result of supporting different counters, and no additional branches/if statements in the code.

The branching doesn't go away: it moves to the 32-bit version to prevent a counter overflow.

If that logic fails, it's effectively nonce reuse, which is keystream reuse, which in the context of ChaCha20Poly1305 means authentication key reuse, which can lead to chosen ciphertext attacks, which can lead to full plaintext recovery.

IMO it's less dangerous to adapt a 32-bit counter into a 64-bit counter, than to use a 64-bit counter internally but with an additional branch to put a 32-bit cap on it.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 15, 2025

I think option 1 is the simplest option out of all of them. All of the backends will behave the same, with no duplicated code as a result of supporting different counters, and no additional branches/if statements in the code.

The branching doesn't go away: it moves to the 32-bit version to prevent a counter overflow.

The branching does go away. 64-bit counters behave the exact same as a 32-bit counter except the moment they overflows. Doesn't the cipher code already check if it will overflow before generating blocks? If it does already do that, then it would be true that there is no additional branching by switching to only 64-bit counters in the backend. I'm specifically referring to branching in the backends directory

@tarcieri
Copy link
Member

tarcieri commented Aug 15, 2025

The branching does go away. 64-bit counters behave the exact same as a 32-bit counter except the moment they overflows.

...except the upper half is now incremented, which is the equivalent of using a different nonce.

Doesn't the cipher code already check if it will overflow before generating blocks?

We're talking about implementing a 32-bit logical counter using a 64-bit counter in the backend. The cipher code won't be aware of any overflow in that case, since you're truncating a 64-bit counter to a 32-bit counter. The truncation needs a branch to ensure the counter hasn't overflowed, since that would lead to nonce reuse.

I'm specifically referring to branching in the backends directory

There's no modifications to the backends in this PR?

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 15, 2025

I'm specifically referring to branching in the backends directory

There's no modifications to the backends in this PR?

This PR added a "new type" (ChaChaXLegacyRng) in rng.rs that handles the 64-bit counter by using an extra generate() method for ChaChaCore. The RNG was the main topic of the recent discussions in #334, so I focused on that. This PR could be useful for adding support for 2 RNG variants, but as @dhardy mentioned, I think the RNG folks only want one 64-bit RNG instead of a 32-bit RNG.

But I can make a new PR that switches to only 64-bit counters in the backends by referencing my counter_support_2.0 branch, and only keeping the 64-bit counter code. It should require less than 10 lines per backend if I recall correctly, and then a little bit of code in rng.rs and legacy.rs, and potentially removing variants.rs. That is assuming that they still only want a 64-bit counter RNG. If they want both RNG variants, we will need to use this PR. But I personally see no reason to support both RNG variants if the getter and setter methods are still as flexible as they are now.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 15, 2025

I just updated the backends to use a 64-bit counter instead of a 32-bit counter. neon.rs required more work than expected, but all of the tests pass except for the counter_wrapping test which is expected, since the counter is no longer wrapping. I will ignore that test and run fmt so that it passes. There are no if statements in the backends regarding the counter, but I have not tested each backend yet.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 15, 2025

I was able to get all of the tests passing except for test_chacha_serde_roundtrip. I'm not entirely sure what's wrong.

@nstilt1 nstilt1 changed the title chacha20: added newtype to simulate a 64-bit counter chacha20: only 64-bit counters approach Aug 15, 2025
Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now! I will give @tarcieri some time to review it, but otherwise it looks good to merge. It would be nice to add KATs for large offsets, but we can do it in a separate PR.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 23, 2025

I think chacha20 already uses those test vectors with ChaCha20Rng rather than ChaCha12Rng. I think the best way to validate the functionality of the 64-bit counter using the RNG is to write a test that captures 4 blocks starting at the following block_poses:

  • u32::MAX as u64
  • [u32::MAX, 1]
  • [u32::MAX, 2]
  • [u32::MAX, u32::MAX]

and so on. That way, the counter carrying over will be captured at multiple upper-counter-word positions. And the test would compare rand_chacha with chacha20 since it seems to be a known-good implementation, and by comparing 4-block outputs, we will be able to confirm that the counter overflowed successfully in all of the backends multiple times. It would be a shame if rand_chacha was implemented incorrectly, or if both libraries were implemented incorrectly in the exact same way such that all of the tests pass on every backend. But the probability of that happening is near 0% if not 0%.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 23, 2025

Do you want me to add those test vectors though? I can add them right quick. I don't think they cover the 64-bit aspect. There would need to be either overflowing or wrapping (or both), or have the upper 32 bits of the block pos set to something

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 23, 2025

Something like this would be the 64-bit counter wrapping and overflow test. It passes on all backends and with big endian, but it might be unnecessary:

    #[test]
    fn counter_64bit_overflow_and_wrap_x_rand_chacha() {
        use rand_chacha::ChaCha20Rng as OGChaChaRng;

        let seed = [
            0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
            24, 25, 26, 27, 28, 29, 30, 31,
        ];
        let mut rng1 = ChaCha20Rng::from_seed(seed);
        let mut rng2 = OGChaChaRng::from_seed(seed);
        for block_pos_upper_word in &[0, 1, 2, 3, 4, u32::MAX] {
            let block_pos = u32::MAX as u64 | ((*block_pos_upper_word as u64) << 32);
            rng1.set_block_pos([u32::MAX, *block_pos_upper_word]);
            rng2.set_word_pos(block_pos as u128 * BLOCK_WORDS as u128);
            let mut output_1 = [0u8; 256];
            let mut output_2 = [0u8; 256];
            rng1.fill_bytes(&mut output_1);
            rng2.fill_bytes(&mut output_2);
            assert_eq!(output_1, output_2);
        }
    }

@tarcieri
Copy link
Member

@nstilt1 the comment I linked to earlier had suggested adding upstream test vectors for the counter wrap: rust-random/rand#1654 (comment)

If we add them upstream there, we'll have official KATs to test against.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 24, 2025

How does one go about finding these test vectors? I've been looking for a few minutes and only have found this so far. It contains a python script that generates test vectors from the "cryptography" python library, supposedly from PyCA

https://cryptography.io/en/stable/development/custom-vectors/chacha20/

They have a github with the test vector already in a file, no need to run their code:

https://github.com/pyca/cryptography/blob/main/vectors/cryptography_vectors/ciphers/ChaCha20/counter-overflow.txt

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 24, 2025

The test vectors have been added. Not sure if they belong in kats.rs, along with the other true_values tests.

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 24, 2025

I see those are licensed under BSD and Apache. Do we have to treat them any special way?

@nstilt1 nstilt1 changed the title chacha20: only 64-bit counters approach chacha20: 64-bit counter support Aug 24, 2025
@tarcieri
Copy link
Member

IANAL but I don't think we have to worry about copyright as it relates to test vectors, especially if you're extracting the raw values and transforming them into Rust syntax. They're just algorithm inputs and outputs, not creative works

@nstilt1
Copy link
Contributor Author

nstilt1 commented Aug 26, 2025

I found the culprit regarding that big endian issue. Replaced the bit shifts with const multiplication because the compiler should be able to optimize it into bit shifts.

School is starting back up and I probably won't be able to spend as much time on this. Will try to respond to any queries that arise and fix any bugs. I was thinking about adjusting the counter overflow test one more time to have slightly different starting blocks, but I realized that the backends' parblocks all start from the same block (u32::MAX in this case) and use 64-bit addition for the upper 3 blocks. So I don't think there's a reason to start at u32::MAX - 1, -2, -3.

@tarcieri tarcieri merged commit 078b648 into RustCrypto:master Aug 27, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chacha20: 64-bit counter support
4 participants