-
-
Notifications
You must be signed in to change notification settings - Fork 779
doc/ExternalDependencies: carve out exception for Tock-maintained crates #4589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bradjc Let's move the discussion of #4588 (comment) here, which I've seen only after filing this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that we don't control crates.io, and once we include a crate from crates.io we lose any traceability about that external dependency.
I'm comfortable including tock org dependencies via github with a commit hash.
|
(moving the comments from other thread here) @bradjc said
Is there a reason to use crates.io or can we link to the repo directly? It seems like the crates.io source can change arbitrarily (as I think we are demonstrating here), but it would be nice if a change to the project required a change in Tock. But, now that I write that, github would probably redirect automatically if we moved the tock-reg repo to some other org (e.g., TockUnsafeCorp). Can we specify a git commit hash (and require that)?
I think of this based on what is in cargo.toml (ie what we can inspect by looking at this repo). I think anything through crates.io is clearly external. I think in github.com/tock is not external or internal, it's like pub(crate). Maybe those are "organizational external dependencies". While I think we can generally treat "organizational external dependencies" like internal dependencies, I don't think of them as the same because I think realistically they will not have the same CI treatment nor the same code review scrutiny. I think with good reason we should create "organizational external dependencies", but, not by default or without review. I think tock-reg is an example where an "organizational external dependency" is appropriate. @ppannuto said IIUC, the concerns you're describing are literally the purpose of Cargo.Lock files. In theory, we include Lock files in our releases for these reasons; in practice, we don't look to have actually done that. More generally/usefully, I wonder whether this indicates that we should perhaps revisit our policy around Lock files — perhaps things have updated such that the churn would not be as painful? Messing around a little, I only get lock file changes when I switch to builds on/off the flux branch, which actually does change dependencies. |
But that would require us to actually review lock file changes? Github hides those diffs (I think), and I personally don't want to review lock files. |
| kernel repository itself. At the discretion of the Tock Core Working Group, | ||
| these crates may be depended upon by other Tock kernel crates, either through | ||
| pinned git revisions, tags, or through external repositories such as | ||
| <https://crates.io>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simultaneously get the value in not being overly prescriptive in policy (especially before we do much of this) and feel that we should just pick one instead of possibly ending up with multiple different approaches.
I suspect this will fall out organically as a result of the primary discussion on the thread.
My instinct is that we should try to keep moving towards "easing integration with the way the rest of the Rust ecosystem works" — i.e., we should use crates.io and provide Cargo.Lock files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I deliberately made this text very permissive is that we as the Tock project are the only ones who can promote any crate to be a "Tock-project controlled and maintained" crate.
With this documented motivated as to give guidance for contributors on which types of dependencies is acceptable, that didn't seem to apply here in quite the same way: we exercise discretion on how and what types of "Tock-project controlled" dependencies are included, depending on what works best for a particular situation.
At the same time, I see value in having clear guidance, even for us (who can collectively change, and thus "overrule" this document). So if this conversation leads to us deciding one way or the other, we should clarify here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document was created to provide guidance, as the policy before this was leaving everything up to the discretion of the tock project maintainers. Going back to that seems like undoing what we tried to make explicit with this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, we can change this document to reflect the outcome of the discussion on this PR.
We removed Lock files from the repo in 2019 because of churn, they're I'll repeat [and update with time context] myself here:
|
|
We are certainly not the only people balancing Lock file churn, yet upstream changed the default to include Lock files in new projects in 2023. I am hoping this is indicative of tooling and support ergonomics improving. |
0d0f0dd to
0661f77
Compare
|
I do expect Tock to, due to its relatively stable crate structure and low number of relatively stable dependencies, have few if any lockfile churn. But it may well happen occasionally, and I acknowledge they're not very human-readable at all. |
|
I am no lockfile expert, but since 2.2 (8 months) we have created 14 new Cargo.toml files. I think that would mean we would be reviewing two lock files a month, but I don't know how often they change.
In the main tock repo, but not in others. In any case, is it not true that github hides lockfile diffs? I think by agreeing to lockfiles at least one person is committing to reviewing them? |
I think we should establish what exactly our threat model around lockfiles is. Is the fear that someone will lock a dependency to a deliberately incorrect or malicious version? Or just that an update to a dependency will be suck in, when we didn't intend that dependency to be updated? From that, we could explore mechanical solutions. For instance, you could imagine having a CI workflow that posts a summary of the lockfile changes to a comment on PRs changing them. For that, a tool like cargo-lockdiff looks promising. |
Lockfiles seem like a big hammer for our use case. In contrast, specifying a git project in our organization with a specific commit seems like precisely the tool we want for using our own external dependencies. We don't need the flexibility that comes with being able to use any crates.io repo from any underlying development source. We also don't need to be restricted to published releases, which was one hesitation for moving this repo outside tock/tock. Plus, what needs to be audit is trivially inspectable by humans, without relying on yet another external tool. Having github repos in Cargo.toml is well supported and widely used. It's not clear to me what the upside of lockfiles is, for our use case. For the more conventional use case of using crates.io dependencies liberally, sure. But that isn't what we do or intend to do. |
Huh, you are right. GitHub will detect generated files, a list which currently includes Interestingly, yarn lock files were removed from hidden-by-default policy because reviewing dependency information was considered important for security. If we opt to bring back committed lock files, GitHub does allow you to use |
Yes, adding a new crate will change the lockfile to at least include that new crate. The changes are rather minimal though: For doing things like adding Workspace-internal dependencies (e.g., adding And finally, adding an entirely new dependency or changing the version of one generates a diff accordingly: Full `Cargo.lock` diff:leons@iron ~/p/t/kernel (dev/external-deps-tock-crates-exception)> diff Cargo.lock.3 Cargo.lock.4
426c426
< version = "0.4.4"
---
> version = "0.5.1"
428c428
< checksum = "1583cc1656d7839fd3732b80cf4f38850336cdb9b8ded1cd399ca62958de3c99"
---
> checksum = "f0d8a4362ccb29cb0b265253fb0a2728f592895ee6854fd9bc13f2ffda266ff1"
1085c1085
< version = "0.5.3"
---
> version = "0.6.2"
1087c1087
< checksum = "8419d2b623c7c0896ff2d5d96e2cb4ede590fed28fcc34934f4c33c036e620a1"
---
> checksum = "9d1fe60d06143b2430aa532c94cfe9e29783047f06c0d7fd359a9a51b729fa25"
1461c1461
< version = "2.4.1"
---
> version = "2.6.1"
1463c1463
< checksum = "6bdef32e8150c2a081110b42772ffe7d7c9032b606bc226c8260fd97e0976601"
---
> checksum = "13c2bddecc57b384dee18652358fb23172facb8a2c51ccc10d74c157bdea3292"
1507c1507
< version = "0.4.1"
---
> version = "0.5.1"
1509c1509
< checksum = "9f214e8f697e925001e66ec2c6e37a4ef93f0f78c2eed7814394e10c62025b05"
---
> checksum = "fc1de2c688dc15305988b563c3854064043356019f97a4b46276fe734c4f07ea"
1511c1511
< "generic-array",
---
> "crypto-common",
|
This is somewhat tangential to this PR, but another problem with lockfiles is that they're "all or nothing". And right now, we're already pulling in crates from So, while I agree that lockfiles are a "big hammer", esp. for what we're proposing today, in the grand scheme of things I'm inclined to think that we actually want / need that "big hammer" to solve this and the other issues around dependency locking we have. |
Exploring this question for a bit this morning... one of the things that was historically painful is Tock's "many binaries" situation. I.e., every time you change which board you are building for, you change the dependency graph, which could result in changes to Now that everything is in one Cargo workspace, if I understand correctly, Cargo considers the full dependency graph of all possible configurations, regardless of which one you are building. I.e., just building a different board (or boards in a different order, etc etc) should never result in changes to Cargo lockfiles. This does come with the possibly-negative side effect of an 'overly broad' lockfile. I.e., a board that doesn't actually include the |
Indeed, the lockfile isn't indicative of the exact set of dependencies for any particular board. It locks a superset of these revisions (although will remove entries when they're no longer depended upon anywhere in the workspace), but you can extract the actual transitive dependencies from it: start from the board crate, and walk the dependency tree. I'm sure there must be a tool for that already. |
|
Oh, and one final note: for |
|
We only have one lock file, in the root, correct? That is certainly an improvement over the prior situation. Having all the dependencies is strange for boards but makes sense for workspaces. Honestly, a single lockfile that isn't constantly changing isn't so bad. What I find dissatisfying about using lockfiles, however, is how passive/cargo-only they are. There is a checksum, that cargo can check, but it's opaque (I think?) to everything else. They are autogenerated, and not (really) for human consumption. That seems like a weird middle ground to me: useless and annoying for users who don't care about dependency tracking, and oddly passive for users who do (ie us). Something like brew files makes more sense to me: the brew file itself contains the hash (eg https://github.com/Homebrew/homebrew-core/blob/0f44549c1203ae616c22ec21f120ce9f588f0a40/Formula/r/rustup.rb#L10). The developer makes a conscious decision about what is correct and actively inserts the valid hash in the file meant for inspection. It would be nice if we could have some other source of truth about what the correct dependency is. We could even let cargo handle all of the sub dependencies. For example, with a git commit hash in a cargo.toml I can go to the github and look at what that commit is and independently see if I think it is an ok commit to use as a dependency. Is there some way to do that with cargo? Can I download something from crates.io and generate that checksum? The obvious resolution to me is if cargo would let us put that checksum in cargo.toml. If that checksum changes it is obvious something changed, and we would have some way to check that what it changed to is ok. It doesn't make sense to me that cargo.lock should ever change without a change to cargo.toml. Hmm, but it can change if cargo decides to use a different version of the crate than specified to meet some dependency mismatch? My hesitation is definitely based on a healthy distrust of cargo in general. Outsourcing everything to it and its own autogenerated files seems like it carries some risk. Again, I think that risk would be mitigated if I, as a board author, can specify a hash in my cargo.toml and know that the guarantee is that cargo uses that version, or I get an error. No letting cargo make unintuitive decisions. We have been buoyed by the simple approach we have generally used: no external dependencies. Changing that, I think, means we need new workflows to handle risks we don't have today. Changing that to cargo is not my preference, as cargo has a track record of not supporting us:
It just seems so much more intuitive to me to drop crates.io entirely and just specify github repos and commit hashes for dependencies. |
Correct. I think we technically have two workspaces (one for
The main argument against this is transitive dependencies. Even if we pin We hypothetically could override that dependencies' sub-dependencies, and then those sub-dependencies' sub-sub-dependencies, but I think this will become tiring very fast. We're going to easily miss some dependencies, esp. when they're added or removed across updates. So, while I empathize with and share many of your frustrations around lockfiles and the state of the ecosystem right now, I don't feel like it'll be great for us to either
In summary, despite these concerns, I think lockfiles are the path of least resistance. Hopefully we can make them a little less painful with more automation and tooling. |
|
Ugh... in the spirit of "cargo doesn't give what I would want..." I think it is the case that we would like the default behavior for all tock builds to be I would like to express this in We could do the "build defaults to edit: Oh yeah, here's the five year old issue asking for the feature; not a lot of traction :( |
|
i.e., we could do: $ git diff
diff --git a/boards/Makefile.common b/boards/Makefile.common
index 4d370d5d3..0f06290cd 100644
--- a/boards/Makefile.common
+++ b/boards/Makefile.common
@@ -323,10 +323,10 @@ $(TOCK_ROOT_DIRECTORY)tools/build/sha256sum/target/debug/sha256sum:
.PHONY: $(TARGET_PATH)/release/$(PLATFORM)
$(TARGET_PATH)/release/$(PLATFORM):
- $(Q)$(CARGO) rustc $(VERBOSE_FLAGS) --bin $(PLATFORM) --release
+ $(Q)$(CARGO) rustc $(VERBOSE_FLAGS) --bin $(PLATFORM) --release --locked
$(Q)$(SIZE) $(SIZE_FLAGS) $@
.PHONY: $(TARGET_PATH)/debug/$(PLATFORM)
$(TARGET_PATH)/debug/$(PLATFORM):
- $(Q)$(CARGO) build $(VERBOSE_FLAGS) --bin $(PLATFORM)
+ $(Q)$(CARGO) build $(VERBOSE_FLAGS) --bin $(PLATFORM) --locked
$(Q)$(SIZE) $(SIZE_FLAGS) $@Which, for a fun (?) data point, did affect 20 crates: which lends some credence to me for Leon's most recent point that (1) we aren't doing a service to reproducibility with our current setup, and (2) managing (sub-)dependencies will be annoying in practice. |
|
I like the idea of a "default frozen" lockfile. But, do we actually care about downstream users making changes to lockfiles? I don't know what this would buy us. We already have a gatekeeper for ensuring we keep lockfiles current: CI. We should definitely add a check that, for whatever is merged, the lockfile matches what's in tree (i.e., do a For regular Cargo operations, it should also never on its own update dependency versions that are tracked in the lockfile. Only |
My motivation is more about "least surprise" for downstream users. I.e., if I understand how Cargo works, if I clone the Tock repo and run Whereas, and omit step 4. IIUC, by default, if I clone the Tock repo, and one of its dependencies happens to have updated, I'll do a build, and suddenly have a dirty git tree. That's a terrible user experience! Now, this kinda-sorta all goes away if we never use semver in Tock (or, do I mis-understand how cargo works?) |
Poking in to agree with this. It would be great to avoid users automatically, unintentionally updating lock files that are part of the build system. |
This should never happen, and if it does, I'd consider this a bug. The lock file should only ever change when adjusting the crate and dependency tree, or when running And on those changes that do require updates to the lockfile, we will have CI reject PRs that don't update it. |
|
It is, as always, very possible that I do not "UC" :) |
What is the difference between |
Okay AI, I actually found this helpful. I am cautiously optimistic that moving to workspaces fixed most of our lockfile ergonomics headaches. It used to suck because we used to change the dependency tree every time we changed boards. My scenario above was wrong because there will now be a So the common-case path is |
|
IIUC the LLM got one thing wrong: when you change a dependency in
In that sense |
|
I think we can add this to the list of things that cargo doesn't support us on. In my opinion, the correct behavior is:
So, What is the value if you think "tock robustly handles dependencies so i don't have to worry" and then cargo just decides to update the lockfile and build the kernel however it wants? If you happen to be using git, sure, you will see a changed file. If not using git, good luck. But why would a I hope I'm still missing something. |
|
@bradjc Cargo tries to strike a balance between not getting in your way for development (automatically adjusting your lockfile if you make any changes that mandate those updates), and providing a reliable way to build a release using the exact set of locked versions ( I think the Cargo behavior is a good tradeoff. |
|
For common-case development, I agree. For the security/correctness domain that Tock lives in, I think I'm of the same mind of Brad. I wish that we as a project could "opt in" to a stronger, if slightly-less-"developer-friendly" behavior. I want to set a I want the build to error if developers accidentally do something that changes dependencies—that's not a common need, and I think Tock should introduce a little bit of intentional friction to make sure users have really thought through what they are doing and understand the implications. |
I completely agree, and I think this is the best articulation so far. Our current approach is maximal friction: hard no. In comparison, having to run one command or update a hash is of, relatively speaking, minimal overhead. |
|
Sadly, as noted yesterday, Cargo doesn't support that workflow natively. I don't think that's a deal-breaker for turning back on lockfiles; I think I'd like to do it with the |
I'm concerned about swinging too far. I think Alex noted this concern as well. Going from "hard no, too risky" to "mostly checked, unlikely to be bad" is pretty drastic. Particularly because we have a very easy alternative: just do nothing. We don't strictly need to follow through with our split-out-tock-reg plan. We can still do the |
The problem here is that we're discussing a potential solution to two separate issues:
In that sense, I think it's a mischaracterization to say we currently have a "hard no, too risky" policy, at least in practice. The status quo is bad. In blunt terms, I think we screwed up with how we implemented the external dependency policy in practice. What we have right now is not "mostly checked, unlikely to be bad", it's "we blindly pull in whatever". The situation right now was the one we wanted to avoid. So, I strongly believe we should discuss lockfiles in that context, or we have to reconsider our external dependency story more broadly. On a less abstract level, I wouldn't mind if we had a |
Yeah, the discussion on this PR really deviated from its title, its actual changes, (and frankly its original purpose). We probably need a new PR [and probably resolve it first] that brings back |
This is fair, but the reason we didn't focus more on this is the ability to opt out is trivial. Boards can easily just not include the "special" capsules that have dependencies. Still, we can improve this. It seems like lockfiles are not as bad as they were, and for these dependencies I think "we blindly pull in whatever" to "mostly checked, unlikely to be bad" is a nice improvement. In contrast, no one can opt out of including For example, does the tock/tock ExternalDependencies.md apply to tock/tock-registers? I think today it clearly would not by default (we use dependencies in libtock-rs, for example). I assume we wouldn't want to add dependencies to tock-registers, but I also think we would need to somehow recreate the work we've done in ExternalDependencies.md to apply to kernel-affiliated tock repos. It's also easier for me at least to understand how we enforce our policies today via PR review in one place. Maybe this is too hypothetical, but if a PR changed the tock-registers version in tock/tock, is there any assumed trust (because presumably the change was reviewed before being merged to tock-registers) if the dependency tree changes? Or is the expectation that it would be entirely re-reviewed as a kernel change? Ok to get more pragmatic, I prefer the status quo. What we have today allows tock-registers to be on crates.io and to be included directly from the repo thanks to cargo. I don't think there is any specific, concrete limitation of our current approach. Perhaps certain dependency trees where registers are passed between tock/tock crates and other crates it causes, friction, but we believe that can be resolved. That being said, I think I am ok with an alternative, including:
I think that is consistent with the spirit of our no dependency policy. |
I've been thinking back some on why we came to this decision in the first place. Here are some of the motivating factors I can recall. I don't think there's any one "killer need" here, but more of a "pressure of factors" kind of situation. I think they all stem from a philosophical question on whether we, as the Tock project, see value in providing
*I do think external adoption has slowed, and I'm not sure that's a good thing. I think we do have a better MMIO approach, and people should use it... |
|
Based on commit history descriptions, there doesn't seem to have been any meaningful changes to tock-registers since 01/2024. https://github.com/tock/tock/commits/master/libraries/tock-register-interface The reasons are certainly compelling. In a perfect world cargo would make this easy for us and allow us to specify constraints on the dependency from kernel/cargo.toml. It would be cool to have more visibility around tock-registers 2.0. |
|
I'm sympathetic to distinguishing the I feel much less strongly about needing to split out A remaining question on my end: why would a lockfile only deliver "mostly checked, unlikely to be bad", vs. "fully checked, guaranteed to point to vetted versions"? It would seem to me that it actually allows us to have pretty strong guarantees about which particular revision of In particular, if a crate is once added to So, assuming we'd be adding back a lockfile for other, external dependencies, what's the missing piece for using this same mechanism for Tock-owned "external" crates? |
IIUC, because cargo does not default to
(n.b., I write this as answer to the question / to better understand cargo limitations, not as a critique of moving forward with lockfiles, since currently we offer no opinion to the tarball user about which versions to use) |
|
Or more generally,
I believe "unlikely" is every feasible scenario that prevents that "should" from being "must" :) |
AFAIK, even if you yank a version from
It'd be a good experiment to see what Cargo's behavior is when specifying a dependency version in the lockfile that doesn't exist at all (e.g., never published to crates.io). I'd hope it to error, and not to implicitly choose a different version. If it doesn't do that, I think we should file a bug. Otherwise, assuming we have CI that ensures that upstream lockfiles are always current, I don't think this should ever happen, and cannot conceive of how it would happen. But I full acknowledge and empathize with the concern that users can relatively easily do something locally (like change their That being said, Cargo also should never touch dependencies for which the user hasn't adjusted the semver string (except for an explicit |
I find it pretty convincing that if cargo felt the need to add If it turns out that I'm wrong, and that if you have a cargo.lock file, and whether you include What about:
Note: I don't actually know if this could happen today with cargo, but it doesn't seem to violate any promises cargo makes. Before the xz/ssh attack I didn't think convoluted stuff like this was possible. But that illusion has been shattered for me. If cargo can't guarantee that it will always enforce the lockfile (without |
I'm always happy to entertain weird hypotheticals, but I think we're getting pretty off track at this point. If your basic assumption starts with "an attacker gets your crate corrupted", it's a pretty easy jump to "your entire toolchain is compromised". Updating to a newer revision of a crate would be a pretty benign outcome in that scenario, all things considered. However, because I couldn't let this go untested, here's the result. As suspected, Cargo doesn't do any integrity checks of any locally stored crates. Once anybody has write access to your Console output:If, however, your lockfile contains a hash that does not correspond to the fetched crate contents, or if it points to a crate version that does not exit, Cargo will (as it should) refuse with an error and leave the lockfile untouched: As said previously, Cargo should only ever update the lockfile when the manifest files change (
So, from the above, the answer to this question is no, but I don't think it's as bad as this thread may suggest it to be: the lockfile will be updated when changes to the manifests ( For building a released version of Tock including a lockfile, neither of these things will happen. So downstream consumers can have solid assurance they're build a blessed release, with transitively locked dependencies. |
How strong is "require"? So if all dependencies specify the same version of a crate, then exactly that version will be used? But if one Cargo.toml specifies a newer semver compliant version, they will all use the new version (and cargo will update the lockfile)? If so, that's at least something.
But, why couldn't cargo check that the dependencies match the lockfile? We have long known that cargo doesn't have the same opinion of dependencies as Tock does, so I'm not surprised about what cargo does here.
Yeah, maybe. "No external dependencies" is a defense mechanism against putting trust in a variety of places outside our control. I'm not saying we have to be paranoid. But, this thread has convinced me that cargo is not focused on a very strict notion of dependency control, and isn't signaling that this is a priority going forward. If cargo had a policy (like we do with ExtDep.md) on their stance, and it was aligned with ours, and over time we could reasonably expect cargo to improve and add features to make this more robust, then I would say that is somewhere we could offload some of our trust. But that's not what I'm seeing. I still think we can add the external dependency for |

Pull Request Overview
Motivated by #4588, this carves out an exception for Tock-controlled and maintained dependencies in our policy around external dependencies.
Testing Strategy
N/A
TODO or Help Wanted
N/A
Documentation Updated
/docs, or no updates are required.Formatting
make prepush.