Skip to content

Panic freedom#3

Draft
maximebuyse wants to merge 15 commits intomasterfrom
panic-freedom
Draft

Panic freedom#3
maximebuyse wants to merge 15 commits intomasterfrom
panic-freedom

Conversation

@maximebuyse
Copy link
Collaborator

@maximebuyse maximebuyse commented Sep 24, 2025

This is a first step for proving panic freedom. Extraction using hax and F* type checking works, but some parts are admitted (with comments). The most common reasons for admits are:

The current state works with hax branch tls-codec-panic-freedom

Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I think there are two different categories in here on overflow/underflow issues. This is definitely the typical class of issues in code like this. And I guess we didn't always review well.

  • When the function is returning an error anyway, these should really be caught and errors returned.
  • For non-failing functions it's a little tricky. Because we don't always want a Result. At the same time would it be nice to do better here. One easy option would be _checked variants that return a Result. Another option would be returning a tuple with a bool on whether the result is correct or not. Then the caller can decide what to do with it. This would be breaking though.

In some cases overflows may also not be possible and we may want to treat code differently, depending on whether that can happen or not.

/// The serialized len
#[inline(always)]
fn tls_serialized_byte_length(&$self) -> usize {
hax_lib::fstar!("admit ()"); // overflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these kind of functions are difficult. We really don't want to return a Result but it's of course true that this may overflow.

In most cases this can't happen. But on a TlsVecU32 (or the slice version) on a 32 bit system this may indeed happen.

We could add a variant of the function with a _checked postfix and return a Result there.

// large and write it out.
let (tls_serialized_len, byte_length) = $self.get_content_lengths()?;

hax_lib::fstar!("admit ()"); // Need to prove unwrap for usize to U24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe U24 should be implemented separately here to allow more checks?


#[inline]
fn index(&self, i: usize) -> &T {
hax_lib::fstar!("admit ()"); // index precondition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the length. Maybe we should have a checked version here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is really the caller's responsibility to provide a valid index so it makes sense to have a precondition. I switched to this in the latest version.

fn from(value: U24) -> usize {
const LEN: usize = core::mem::size_of::<usize>();
let mut usize_bytes = [0u8; LEN];
hax_lib::assume!(LEN == 8); // https://github.com/cryspen/hax/issues/1702
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also be 4 on 32 bit systems. Maybe we should just assume >= 4? We don't want to support smaller than 32 bit systems here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to do LEN == 4 || LEN == 8. Unfortunately this doesn't work because of a hax bug (the issue that is linked in the comment).

@maximebuyse
Copy link
Collaborator Author

Nice!

I think there are two different categories in here on overflow/underflow issues. This is definitely the typical class of issues in code like this. And I guess we didn't always review well.

* When the function is returning an error anyway, these should really be caught and errors returned.

* For non-failing functions it's a little tricky. Because we don't always want a `Result`. At the same time would it be nice to do better here. One easy option would be `_checked` variants that return a `Result`. Another option would be returning a tuple with a `bool` on whether the result is correct or not. Then the caller can decide what to do with it. This would be breaking though.

In some cases overflows may also not be possible and we may want to treat code differently, depending on whether that can happen or not.

Thanks for the feedback. I'll have a look at the suggestions and push a new version with some more disruptive changes to fix some over/underflows.

@maximebuyse
Copy link
Collaborator Author

Nice!
I think there are two different categories in here on overflow/underflow issues. This is definitely the typical class of issues in code like this. And I guess we didn't always review well.

* When the function is returning an error anyway, these should really be caught and errors returned.

* For non-failing functions it's a little tricky. Because we don't always want a `Result`. At the same time would it be nice to do better here. One easy option would be `_checked` variants that return a `Result`. Another option would be returning a tuple with a `bool` on whether the result is correct or not. Then the caller can decide what to do with it. This would be breaking though.

In some cases overflows may also not be possible and we may want to treat code differently, depending on whether that can happen or not.

Thanks for the feedback. I'll have a look at the suggestions and push a new version with some more disruptive changes to fix some over/underflows.

I just pushed a new version where additions and subtractions are replaced by checked_add and checked_sub when they happen in a function that returns a Result. I mostly used Error::LibraryError in case of overflow, we might want to change that.
I also fixed a few other issues, so now we are down to 5 places with potential overflows (where we cannot return an error). We also have left the usize to U24 conversion issue, and the various issues that are caused by hax.

@maximebuyse
Copy link
Collaborator Author

@franziskuskiefer I made a try for having a checked version for tls_serialized_len alongside the unchecked version. Do you think this style would be ok? It solves the panic freedom issues we have been having but it is quite disruptive.

Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _checked versions lgtm.
I'm a little concerned about performance of the unchecked versions because they produce significantly more code now.

type Error = Error;

fn try_from(value: usize) -> Result<Self, Self::Error> {
fn try_from(value: usize) -> Result<Self, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't hax handle this? It would be better to not having to keep these two types in sync manually.

}
#[inline]
fn tls_serialized_len(&self) -> usize {
1usize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapping the match and computation avoids computation in case of None.

#[inline]
fn tls_serialized_len(&self) -> usize {
1usize
.checked_add(match self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check the performance? All the additional checks may make things slower. Since the result will be wrong in either case, we should make sure not to slow down things here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franziskuskiefer I ran the benchmarks, and for TLS Serailize (vector and slice), there is a performance regression. I takes about +38% of runtime. So what do you think we should do?

@franziskuskiefer
Copy link
Member

@maximebuyse @karthikbhargavan what are the next steps here?

@maximebuyse
Copy link
Collaborator Author

@maximebuyse @karthikbhargavan what are the next steps here?

We could schedule some time next week to finish this. The main question here is about the checked version of tls_serialized_len. I am not so convinced about this as it complexifies and still returns a wrong result for the normal version in case of overflow. If we want to keep it the main question is about performance, is there a benchmark somewhere that we can run to check the impact?

@karthikbhargavan
Copy link

@franziskuskiefer shall we meet to discuss how to get this home?

@franziskuskiefer
Copy link
Member

@franziskuskiefer shall we meet to discuss how to get this home?

Yes, please put something on the calendar 📆

@franziskuskiefer
Copy link
Member

Looking at this and trying to figure out the best way, I think we need to take a step back and plan this out better.

While I think we can make this work mostly as is in here, we should think about what it should look like, while keeping things working for existing consumer. So the question to me really is: what's a safe, but fast API, and how to add it in a way that consumers can transition to it without breaking what they are doing right now.

Observations so far

  • Checked arithmetic is too slow for use in release mode.
  • We can move the proofs to debug-only or hax mode.
  • These are serialisation issues, where we need to figure out the length of the bytes written out.

Serialization

The recommended flow, which is also implemented in the derive macros for serializing a struct is

  1. Get the length of the serialized struct using tls_serialized_len.
  2. Allocate the required amount of memory.
  3. Write the values into the allocated memory.

With this flow, we only need to check the additions in step 1. And in many cases we know that lengths can't overflow and may not want to take the hit of the additional checks even in step 1.

With that I think we want

  • a checked version of len and serialize
  • annotate and document the non-checked versions with the bounds in which they work correctly
    • ensure there are debug asserts for all of these
  • implement the checked versions in a separate trait to avoid breakage

Deserialization

When deserializing we don't need to allocate, but we need to move the pointer along the input bytes.
On 64 bit machines there should be no issue in summing up the lengths because usize is as big as the largest lengths allowed. On 32 bit machines though, the encoded length may be 8 bytes, such that we can't write that out.
So we may want a checked deserialize as well, which ensures that we can parse and store the incoming bytes. (We may want to add a no-copy version at some point, which would be able to handle this again.) However, this is again a check only on the length, not on the actually read bytes.

Variable length encodings

These encodings have explicit length checks at runtime already (because they are required). We should probably make use of them, or encode them better such that can make use of them in the proofs.

But these checks are the reason I was unable so far to trigger any length encoding issue.

Impact on MLS

MLS only uses fixed sized arrays and variable length quic encodings. However, MLS also restricts the variable length encodings to 4 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants