Skip to content

Conversation

@laurazard
Copy link
Member

@laurazard laurazard commented Dec 11, 2025

Pull Request

What? (description)

Why? (reasoning)

Closes #8720

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you ran conformance (make conformance)
  • you formatted your code (make fmt)
  • you linted your code (make lint)
  • you generated documentation (make docs)
  • you ran unit-tests (make unit-tests)

See make help for a description of the available targets.

@github-project-automation github-project-automation bot moved this to To Do in Planning Dec 11, 2025
@laurazard laurazard self-assigned this Dec 11, 2025
@laurazard laurazard force-pushed the poc-talos-debug branch 8 times, most recently from ce722ef to 97f4bb2 Compare December 17, 2025 12:17
@laurazard laurazard force-pushed the poc-talos-debug branch 3 times, most recently from 95c259b to 79cd19e Compare January 12, 2026 12:59
@laurazard laurazard moved this from To Do to In Progress in Planning Jan 12, 2026
@laurazard
Copy link
Member Author

@smira when you get a sec can you TAL and see if you have any early feedback/ideas?

In the first attempt at the `DebugContainer` API, a single gRPC endpoint
was added (`rpc DebugContainer`) to receive the debug container spec,
create and configure the container, and run it + handle IO streams.

This patch splits this into two separate RPCs:
- `DebugContainerCreate`, and
- `DebugContainerRun`

This results in a cleaner API with better separation of concerns.

Signed-off-by: Laura Brehm <[email protected]>
@smira smira moved this from In Progress to In Review in Planning Jan 14, 2026
Signed-off-by: Laura Brehm <[email protected]>
"/machine.MachineService/ImageList",
"/machine.MachineService/Kubeconfig",
"/machine.MachineService/List",
"/machine.MachineService/DebugContainer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this

rpc ImageList(ImageListRequest) returns (stream ImageListResponse);
// ImagePull pulls an image into the CRI.
rpc ImagePull(ImagePullRequest) returns (ImagePullResponse);
// DebugContainerCreate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving some notes after the call:

probably we can move this out to a separate DebugService (I think it makes sense to be machine.DebugService) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, but I think apid_test should scream at you as you haven't specified RBAC for it.

If it doesn't scream, let's make sure it screams, and specify RBAC (os:admin).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I'm blind, I see RBAC changes, but let's double check if we move it to a separate proto, that the test picks it up

return proxy.One2One, nil, status.Error(codes.InvalidArgument, "one-2-many proxying is not supported for COSI methods")
}

if strings.HasPrefix(fullMethodName, "/machine.MachineService/DebugContainer") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turn this inside out, specify methods which support one2many proxying (probably /machine.MachineService and a couple of others ? and everything else should return an error

}

// DebugContainerRun implements the machine.MachineServer interface.
func (s *Server) DebugContainerRun(srv machine.MachineService_DebugContainerRunServer) error { //nolint:gocyclo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we separated the protos, we can separate the implementation into its own DebugServer Go struct which we can now "mix in" into the maintenance/normal mode client.

We should probably double-assert here on os:admin RBAC, as in maintenance mode we assign "os:admin" to SideroLink activated connections, and there is no middleware for RBAC

func (s *Server) DebugContainerCreate(srv machine.MachineService_DebugContainerCreateServer) error { //nolint:gocyclo
ctx := srv.Context()

client, err := containerdapi.New(constants.SystemContainerdAddress,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

food for thought: should we pick up the containerd to use automatically? if the "CRI" one is up, use it always? and use "system" one only if CRI is not up yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Ability to debug nodes with running debug container

2 participants