Skip to content

Conversation

@karlbaumg
Copy link
Contributor

@karlbaumg karlbaumg commented May 11, 2025

This is a very useful feature for cases where we'd like to give wildcard device permissions to the cgroup, e.g. --device-cgroup-rule 'c *:* rwm' which is possible in both Podman and Docker.

Found out the hard way that Linux.Resources.Devices are not processed at all.

If I understood the claim logic correctly, it is to prevent multiple plugins from overriding each other. But the cgroup rules array is a layered permission list, e.g. it starts with a deny all rule and allow rules are appended so different plugins may append different rules. That's why I didn't add a claimDeviceCgroup but happy to change.

I've validated this change with containerd 1.7.27 that has this patch on top of NRI 0.8.0.

@karlbaumg karlbaumg changed the title createContainer: include linux device cgroups in adjustment logic Include linux device cgroups in adjustment logic May 11, 2025
@klihub
Copy link
Member

klihub commented May 12, 2025

Found out the hard way that Linux.Resources.Devices are not processed at all.

For some context/background... Those device permission rules in Linux.Resources.Devices were one of the last additions to the revamped NRI API (v0.2.0 at the time), and added merely to allow near-identical emulation of the original v0.1.0 NRI API through the plugins/v010-adapter plugin. They are only input to a plugin at the moment.

That's why we don't have any corresponding Set/Add/Remove functions defined for manipulating them in pkg/api/adjustment.go and we don't process the rules when adjusting the OCI Spec (although we do add explicit entries there for any devices injected through NRI). And that is why the device rules are not listed as adjustable container parameters in the documentation(, although I have to admit that it alone is not a reliable indicator as the docs are not fully accurate yet).

If I understood the claim logic correctly, it is to prevent multiple plugins from overriding each other. But the cgroup rules array is a layered permission list, e.g. it starts with a deny all rule and allow rules are appended so different plugins may append different rules.

Yes, in current HEAD the claims are only used for conflict detection. That is however changing with the pending introduction of pluggable validation (#163) , where the claims, which are also used to record which plugins made what changes, are passed on to any registered validating plugins, which then decide whether such changes are accepted or rejected and this decision can be based on which plugins made the changes.

That's why I didn't add a claimDeviceCgroup but happy to change.

Can you tell a bit more about what you are trying to do (basically your 'use case') ? Things like

  • do you now manually adjust the cgroup device rules in resources ?
  • how do your containers gain the extra device nodes (not via NRI injection because then you would also get the corresponding necessary access rules) ?
    • bind mount ?
    • mknod in the containers ?

If we extend NRI to allow direct manipulation of cgroup device access rules, keeping the general patterns of NRI and #163 in mind, I think we'd need at least

  • think through what could be a proper model for adjusting cgroup device rules (independently of adjusting devices):
    • removing is easy, but since
    • order matters, how to add (without overcomplicating), prepend/append, something else, etc...
  • add proper adjustment functions based on what we decide
  • always record in the claims which plugins touched what rules

@karlbaumg
Copy link
Contributor Author

Can you tell a bit more about what you are trying to do (basically your 'use case') ?

Sure. I'm running a program that creates new devices when it starts whose minor can't be known beforehand, hence we pass in --device-cgroup-rule 'c 238:* rwm' with podman or docker. But there is no equivalent in Kubernetes' Pod API so we implemented an NRI plugin that adds those rules. Given the existence of the flag in both, I don't think dynamically created devices is a rare use case which makes sense because you don't have to do the bookkeeping of devices when you do that.

how do your containers gain the extra device nodes

Using ioctl on binderfs.

@karlbaumg
Copy link
Contributor Author

Hey @klihub ! Is there a chance this will be merged? We've been using it via our fork in production and it's been working OK.

@klihub
Copy link
Member

klihub commented Jul 31, 2025

Hey @klihub ! Is there a chance this will be merged? We've been using it via our fork in production and it's been working OK.

Sure, let's review it properly first, to see if there is anything to improve. I'll tag a few others to help in that.

/cc @mikebrow @chrishenzie

@chrishenzie
Copy link
Contributor

Code generally LGTM.

I agree that plugins should have claims, but how we would define it is unclear to me. Are we claiming ownership of the rule, or of the device inside the rule?

For example, if we were saying claims were tied to the devices inside the rules, I think it would be unreasonable for a plugin to claim a rule with "wildcard" devices (e.g. no other plugins can then add or remove rules for any devices).

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mikebrow mikebrow merged commit fcd3e39 into containerd:main Aug 1, 2025
9 checks passed
@karlbaumg
Copy link
Contributor Author

Thank you all!

@karlbaumg karlbaumg deleted the device-cgroup-fix branch August 2, 2025 09:01
@klihub
Copy link
Member

klihub commented Aug 4, 2025

@karlbaumg I'm now a bit late with this comment/question (which is related to something I notice I had already commented earlier, about ). So in your above mentioned NRI plugin, do you do something like

func (p *plugin) CreateContainer(ctx context.Context, pod *api.PodSandbox, ctr *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
...
    a := &api.ContainerAdjustment{
        Linux: &api.LinuxContainerAdjustment{
            Resources: &api.LinuxResources{
                Devices: []*api.LinuxDeviceCgroup{
                    {
                        Allow: true,
                        Type: "c",
                        Major: api.OptionalInt64(238),
                        Access: "rwm",
                    },
                },
            },
        },
    }
....
    return a, nil, nil
}

IOW, you add a new cgroup device rule like this (directly as data, without using a function for it) aand then return that among the adjustments for the container ? If you do (I apologize for me being hasty with my approval in the review and) I think we should add a function for doing this, mostly for attempted consistency with the rest of the code.

Also I think it probably would make sense to hook this in for conflict detection via the existing Claim/Clear* mechanism, probably using the type+major+minor as the key for the claim. So then in the above example, that would amount to claiming the "c-238-*" "DeviceCgroupRule". I can roll an implementation for this to better illustrate what I mean...

@karlbaumg
Copy link
Contributor Author

Yes, here is exactly what I'm returning as a single element in the array:

{
  Allow:  true,
  Access: "rwm",
  Major:  api.Int64(-1),
  Minor:  api.Int64(-1),
  Type:   "c",
}

Which was already available on the response object btw, it was just not processed at all. I'm not really up to speed with claim/clear mechanism but from the sound of it, generic entries like this one, may not fit perfectly well but I agree it'd be consistent with the rest.

@klihub
Copy link
Member

klihub commented Aug 6, 2025

Yes, here is exactly what I'm returning as a single element in the array:

{
  Allow:  true,
  Access: "rwm",
  Major:  api.Int64(-1),
  Minor:  api.Int64(-1),
  Type:   "c",
}

Which was already available on the response object btw, it was just not processed at all. I'm not really up to speed with claim/clear mechanism but from the sound of it, generic entries like this one, may not fit perfectly well but I agree it'd be consistent with the rest.

@karlbaumg I can take a look at adding the missing claim bits and then tag you for review, if that helps...

@karlbaumg
Copy link
Contributor Author

@klihub Sounds good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants