-
Notifications
You must be signed in to change notification settings - Fork 77
MIR: Enforce disjointness from precondition allocations and statics #2678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add the additional checks in LLVM enforceDisjointness to MIR enforceDisjointness. In particular, when verifying post-state allocations, ensure that they are disjoint from all pre-state allocations and all statics, and when matching pre-state allocations for overrides, ensure that they are disjoint from all statics. Fixes the MIR part of #2665.
-- Ensure that all regions are disjoint from each other and extras and | ||
-- statics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this only require that mutable regions be disjoint from each other? If I understand correctly, this is checking for all memory regions (both immutable and mutable), which may be responsible for some of the spurious test suite failures seen in CI (e.g., test_mir_verify_arrays
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks that the regions in question (either precondition allocs or postcondition allocs) are disjoint from each other, and from all extras and statics, but not that the extras and statics are disjoint from each other. As noted in #2665, for MIR we need to check disjointness from immutable statics as well. All 5 test cases, 2 of which involve only immutable statics, lead to unsoundness with the current master
build of SAW. #641 (comment) says that we don't need to disallow aliasing immutable regions for LLVM because the crucible-llvm
memory model forbids comparing const
pointers. However, as far as I'm aware this is not the case for the crucible-mir
memory model.
The test failures seem to be because input and output allocations are indeed aliasing with immutable statics, which includes when you take a reference to a constant like &42
. So I'm not really sure what to do here, because it seems like this is a pattern that does turn up in regular Rust code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in #2665, for MIR we need to check disjointness from immutable statics as well.
Sorry, let me clarify: we should relax this check to allow immutable references to alias each other, while still requiring immutable raw pointers to be disjoint. (Note that all of the examples in #2665 use raw pointers rather than references.) For instance, test_mir_verify_arrays
only involves immutable references, so it really shouldn't be impacted by these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can do that... it seems like this still leads to unsoundness:
pub fn foo_out_static() -> &'static u32 {
&42
}
pub fn bar_out_static() -> bool {
let x: &u32 = &42;
x as *const u32 == foo_out_static() as *const u32
}
The raw pointers exist only inside the function, so we would need to do a check in crucible for the reference to pointer cast or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, you're saying that if you verified foo_out_static
and then used its proof as a compositional override when proving bar_out_static
, then the region that SAW allocates for foo_out_static()
would be disjoint from the region for x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you use mir_alloc
in the postcondition, SAW will create a new allocation when it is used as an override, disjoint from any statics. SAW (without the fix in this PR) verifies the following:
enable_experimental;
m <- mir_load_module "test.linked-mir.json";
let foo_out_static_spec = do {
mir_execute_func [];
y <- mir_alloc mir_u32;
mir_return y;
};
foo_out_static_ov <- mir_verify m "test::foo_out_static" [] false foo_out_static_spec z3;
let bar_out_static_spec0 = do {
mir_execute_func [];
mir_return (mir_term {{ False }});
};
mir_verify m "test::bar_out_static" [foo_out_static_ov] false bar_out_static_spec0 z3;
let bar_out_static_spec1 = do {
mir_execute_func [];
mir_return (mir_term {{ True }});
};
mir_verify m "test::bar_out_static" [] false bar_out_static_spec1 z3;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! This is quite tricky.
First, a minor correction: I think you want your Rust code to look like this:
pub fn foo_out_static() -> &'static u32 {
&42
}
pub fn bar_out_static() -> bool {
let x: &u32 = &42;
std::ptr::eq(x, foo_out_static()) // This line is different
}
x as *const u32 == foo_out_static() as *const u32
leverages the PartialEq
impl for raw pointers, but this impl compares the values that the pointers point to, not the addresses of the pointers themselves. This doesn't detract from your overall point, however: Rust still allows this program, and your SAW script still unsoundly verifies bar_out_static
against two nearly identical specs that nevertheless return different answers. Bummer.
My first inclination is to wonder is how the LLVM backend (which explicitly allows immutable pointers to alias each other) deals with this. Here is a port of your program to C:
// test.c
#include <stdint.h>
const static uint32_t FORTY_TWO = 42;
const uint32_t* foo_out_static(void) {
return &FORTY_TWO;
}
int bar_out_static(void) {
const uint32_t* x = &FORTY_TWO;
return x == foo_out_static();
}
// test.saw
m <- llvm_load_module "test.bc";
let foo_out_static_spec = do {
llvm_execute_func [];
y <- llvm_alloc_readonly (llvm_int 32);
llvm_return y;
};
foo_out_static_ov <- llvm_verify m "foo_out_static" [] false foo_out_static_spec z3;
let bar_out_static_spec0 = do {
llvm_execute_func [];
llvm_return (llvm_term {{ 0 : [32] }});
};
llvm_verify m "bar_out_static" [foo_out_static_ov] false bar_out_static_spec0 z3;
let bar_out_static_spec1 = do {
llvm_execute_func [];
llvm_return (llvm_term {{ 1 : [32] }});
};
llvm_verify m "bar_out_static" [] false bar_out_static_spec1 z3;
SAW does reject using foo_out_static_ov
as a compositional override, but not for the reasons I would have expected:
$ ~/Software/saw-1.3/bin/saw test.saw
[12:29:42.026] Loading file "/home/rscott/Documents/Hacking/C/test.saw"
[12:29:42.031] Verifying foo_out_static ...
[12:29:42.032] Simulating foo_out_static ...
[12:29:42.033] Checking proof obligations foo_out_static ...
[12:29:42.033] Proof succeeded! foo_out_static
[12:29:42.089] Verifying bar_out_static ...
[12:29:42.089] Simulating bar_out_static ...
[12:29:42.089] Registering overrides for `foo_out_static`
[12:29:42.089] variant `Symbol "foo_out_static"`
[12:29:42.089] Matching 1 overrides of foo_out_static ...
[12:29:42.090] Branching on 1 override variants of foo_out_static ...
[12:29:42.090] Applied override! foo_out_static
[12:29:42.091] Checking proof obligations bar_out_static ...
[12:29:42.107] Subgoal failed: bar_out_static test.c:12:14: error: in bar_out_static
Const pointers compared for equality
Details:
(4, 0x0:[64])
(6, 0x0:[64])
HeapAlloc 6 0x4:[64] Immutable 4-byte-aligned /home/rscott/Documents/Hacking/C/test.saw:6:5 (Poststate)
StackAlloc 5 0x8:[64] Mutable 8-byte-aligned test.c:11:21
GlobalAlloc 4 0x4:[64] Immutable 4-byte-aligned [global variable ] FORTY_TWO
GlobalAlloc 3 0x0:[64] Immutable 1-byte-aligned [defined function ] bar_out_static
GlobalAlloc 2 0x0:[64] Immutable 1-byte-aligned [defined function ] foo_out_static
GlobalAlloc 1 0x0:[64] Immutable 1-byte-aligned [external function] llvm.dbg.declare
[12:29:42.108] SolverStats {solverStatsSolvers = fromList ["SBV->Z3"], solverStatsGoalSize = 29}
[12:29:42.108] ----------Counterexample----------
[12:29:42.108] <<All settings of the symbolic variables constitute a counterexample>>
[12:29:42.108] ----------------------------------
[12:29:42.108] Stack trace:
"llvm_verify" (/home/rscott/Documents/Hacking/C/test.saw:17:1-17:12)
Proof failed.
What an interesting error message! This arises from this code in the crucible-llvm
memory model:
eval (LLVM_PtrEq mvar (regValue -> x) (regValue -> y)) = do
mem <- getMem mvar
liftIO $ do
...
v3 <- G.notAliasable sym x y (memImplHeap mem)
...
unless (laxConstantEquality ?memOpts) $
do let allocs_doc = G.ppAllocs (G.memAllocs (memImplHeap mem))
let x_doc = G.ppPtr x
let y_doc = G.ppPtr y
-- TODO: Is this undefined behavior? If so, add to the UB module
assert bak v3 $
AssertFailureSimError
"Const pointers compared for equality"
(unlines [ show x_doc
, show y_doc
, show allocs_doc
])
ptrEq sym PtrWidth x y
This is interesting because, as far as my reading of the C standard goes, this is actually not undefined behavior. Namely, I believe it is fine to check two pointers for equality provided that they point to the same type and come from the same allocation region. This code was introduced in GaloisInc/crucible@58e2be3, whose commit message notes:
The LLVM memory model generally does not allow for different memory regions to alias each other: Pointers with different allocation block numbers will compare as definitely unequal. However, it does allow two immutable memory regions to alias each other. To make this sound, equality comparisons between pointers to different immutable memory regions must not evaluate to false. Therefore pointer equality comparisons assert that the pointers are not aliasable: they must not point to two different immutable blocks.
This is a pretty brutal solution to the problem. I suppose we could adopt a similar solution in Rust, but that would reject many seemingly valid uses of the std::ptr::eq
function, such as in the program seen above. That seems quite smelly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of brutal solutions to problems, we might want to re-examine if we should permit things like this:
let foo_out_static_spec = do {
mir_execute_func [];
y <- mir_alloc mir_u32;
mir_return y;
};
This is rather fishy, since it is using a mir_alloc
in the post-condition to refer to a reference to a static value. This goes against the spirit of mir_alloc
(and similar functions) slightly. Usually, a mir_alloc
in a post-condition indicates that the function being verified is expected to perform the allocation, but that's not what's happening here, as a static reference will be allocated for the entire duration of the program. As such, one could make the argument that this spec should be rejected.
On the other hand, if we do reject things like this, we had better have a viable migration story. In the LLVM version of the spec, it suffices to change the details slightly:
let foo_out_static_spec = do {
llvm_execute_func [];
llvm_return (llvm_global "FORTY_TWO");
};
This only works because FORTY_TWO
has an identifiable name that we can easily look up, however. This isn't the case in the Rust program, however. It is true that &42
does correspond to a named static value in the MIR JSON, but it is a difficult-to-find name like test::foo_out_static::{{promoted}}[0]
. I would hate for SAW users to have to dig into this MIR JSON file just for the sake of figuring out what name to pas to mir_static
.
Perhaps we could devise a command that lets users look up static items by the parent functions/values, but I don't yet know what a good API for this command would be...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PartialEq
impl for raw pointers compares by address (I tested my example). The impl for references compares by value. std::ptr::eq
is just ==
for raw pointers inside a function, so that you can pass references to the function and they will automatically coerce to raw pointers, allowing you to avoid the explicit cast. But they do the same thing.
I agree that we shouldn't do what the LLVM backend does. In addition to what you mention, *const
pointers are easily convertible to *mut
(as we see in Vec
), so for the purposes of safety we should treat them identically in SAW/crucible. And we definitely do not want to ban all pointer equality checks.
I also agree that we should probably reject mir_alloc
for static values. I'm not sure even looking up by the parent function name would work though. The fact that bar_out_static()
returns true means the compiler has used the same static for both instances of &42
in different functions, and looking at the MIR it seems to be named test/de88ee43::{{alloc}}[0]
, so it is not under a particular function at all. Worse, I'm not sure if this optimization is guaranteed to happen. I can only find documentation that constant promotion happens, not that multiple identical constants are guaranteed to be promoted to the same static.
Maybe we can have something like mir_static_of : MIRType -> MIRValue -> MIRValue
where the user can look up an immutable static by its initializer value. So in this case you would write something like mir_static_of mir_u32 (mir_term {{ 42 : [32] }})
and SAW would search for an immutable static with value 42
. If it finds multiple such statics, it will error and the user will have to dig through the MIR themselves to find the right one. But hopefully this covers most cases, assuming the aforementioned optimization does happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
PartialEq
impl for raw pointers compares by address (I tested my example).
Ah, my mistake! (I was confused by the code used in the PartialEq
impl for raw pointers, which I incorrectly concluded was dereferencing the pointers like the PartialEq
impl for references does.)
Worse, I'm not sure if this optimization is guaranteed to happen. I can only find documentation that constant promotion happens, not that multiple identical constants are guaranteed to be promoted to the same static.
This is not guaranteed per this section of the Rust Reference (emphasis mine):
Constants are essentially inlined wherever they are used, meaning that they are copied directly into the relevant context when used. This includes usage of constants from external crates, and non-
Copy
types. References to the same constant are not necessarily guaranteed to refer to the same memory address.
Here is an example of this happening in practice. You need a Cargo workspace that looks like this:
$ tree .
.
├── a
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
├── b
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
├── c
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
├── Cargo.lock
└── Cargo.toml
With the following contents:
# Cargo.toml
[workspace]
members = ["a", "b", "c"]
default-workspace-members = ["c"]
# a/Cargo.toml
[package]
name = "a"
version = "0.1.0"
edition = "2024"
[dependencies]
// a/src/lib.rs
pub fn f() -> &'static u32 {
&42
}
# b/Cargo.toml
[package]
name = "b"
version = "0.1.0"
edition = "2024"
[dependencies]
// b/src/lib.rs
pub fn f() -> &'static u32 {
&42
}
# c/Cargo.toml
[package]
name = "c"
version = "0.1.0"
edition = "2024"
[dependencies]
a = { path = "../a" }
b = { path = "../b" }
// c/src/lib.rs
use a;
use b;
pub fn g() -> u32 {
*a::f() + *b::f()
}
Compiling this with cargo saw-build
produces a c-*.linked-mir.json
with the following statics:
$ cargo saw-build
<snip>
linking 22 mir files into /Users/rscott/Documents/Hacking/Rust/cargo-sandbox/target/aarch64-apple-darwin/debug/deps/c-3d43989448917900.linked-mir.json
<snip>
$ jq .statics /Users/rscott/Documents/Hacking/Rust/cargo-sandbox/target/aarch64-apple-darwin/debug/deps/c-3d43989448917900.linked-mir.json
[
{
"kind": "constant",
"mutable": false,
"name": "a/61fe39a5::{{alloc}}[0]",
"rendered": {
"kind": "uint",
"size": 4,
"val": "42"
},
"ty": "ty::u32"
},
{
"kind": "constant",
"mutable": false,
"name": "b/ddb4d05b::{{alloc}}[0]",
"rendered": {
"kind": "uint",
"size": 4,
"val": "42"
},
"ty": "ty::u32"
}
]
This contains two promoted static items with the same types and values. Granted, I had to resort to putting the static items in different crates in order for this to happen, but it wouldn't surprise me if there were a clever way to observe something similar within a single crate.
Maybe we can have something like
mir_static_of : MIRType -> MIRValue -> MIRValue
where the user can look up an immutable static by its initializer value. So in this case you would write something likemir_static_of mir_u32 (mir_term {{ 42 : [32] }})
and SAW would search for an immutable static with value42
. If it finds multiple such statics, it will error and the user will have to dig through the MIR themselves to find the right one. But hopefully this covers most cases, assuming the aforementioned optimization does happen.
Indeed, I think we need something like mir_static_of
. I'm not sure about what to name it, though, given that it would likely be easy to confuse with the existing mir_static
function. I'm also not sure about rejecting cases where there exist multiple static items with the same types and values, as experience as shown that this does actually happen in practice (the example above is a minimized version of this). Perhaps we should also this function take a String
representing a prefix of the static item's identifier? For instance, mir_static_of "a" ...
would look up static items like a/61fe39a5::{{alloc}}[0]
, mir_static_of "b" ...
would look up static items like b/ddb4d05b::{{alloc}}[0]
, and so on.
This isn't perfect (there is still a possibility that rustc
could promote the same constant within a single crate to two distinct static items), but perhaps this would be good enough 99% of the time.
Adds the additional checks in LLVM's
enforceDisjointness
to MIR'senforceDisjointness
.In particular, when verifying post-state allocations, ensure that they are disjoint from all pre-state allocations and all statics, and when matching pre-state allocations for overrides, ensure that they are disjoint from all statics.
Fixes the MIR part of #2665.