Skip to content

Conversation

@Dsanatar
Copy link
Collaborator

@Dsanatar Dsanatar commented Oct 6, 2025

What this PR does / why we need it:

Extension of #3764

If a clone gets stuck for whatever reason it is helpful to be able to see the original error from the PVC/DV that initiated the clone.

Leveraging the functionality from the above PR to copy events from tmp-pvcs during a clone to the their owner pvc.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Events from Clone PVCs

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 6, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign awels for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coveralls
Copy link

coveralls commented Oct 6, 2025

Coverage Status

coverage: 58.995% (-0.08%) from 59.076%
when pulling 5832cc4 on Dsanatar:clone-pvc-events
into 4c3dc65 on kubevirt:main.

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick first pass 👍

Do I understand correctly that the host assisted/upload controllers are already okay? It's just the csi/snap clone that's broken?

BTW, we should be able to unit test this
(we already use record.NewFakeRecorder in some unit test suites)

Comment on lines +2259 to +2262
// check if target PVC already has this equivalent event
if _, exists := eventMap[msg]; exists {
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we feel about using the message as a sign of existence?
I can totally imagine some part of the message being dynamic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on your answer (maybe this is somehow a non issue) we can come up with other ideas like a combo of reason and something else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't hurt to prepend the reason for a more precise comparison, will address

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think there's a types.UID in the event object

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I wasn't sure we want to propagate duplicates but as long as our fired events will qualify for aggregation (.count increase instead of lots of CREATEs for every event) we should be good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, any duplicate events that may get re-emitted just increment the event counter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah as long as we don't do anything funky (reason/event source/etc are all the same) I guess we should be fine. If the message has a dynamic element, the coalescing should break, but, that's not on us - we just propagate existing events.

}

// use this to hash each message for quick lookup, value is unused
eventMap := make(map[string]bool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge deal but normally in go you'd use an empty struct instead of the bool in this kind of maps, to spare the allocation:
https://stackoverflow.com/a/22770744

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok good to know thanks!

@Dsanatar
Copy link
Collaborator Author

Dsanatar commented Oct 8, 2025

Just a quick first pass 👍

Do I understand correctly that the host assisted/upload controllers are already okay? It's just the csi/snap clone that's broken?

BTW, we should be able to unit test this (we already use record.NewFakeRecorder in some unit test suites)

The CopyEvents func was originally used to get events from a prime PVC to it's associated "parent" PVC/DV in case something got stuck during the prime PVC reconciliation. A similar issue was reported for clones (specially host clone) where the newly created tmp PVC could encounter an error but no indication of this ever makes it back to it's owner PVC/DV. I figured since each of these clone methods use a tmp PVC that could potentially get stuck during reconcile, we should bubble up these events for each clone type.

@akalenyu
Copy link
Collaborator

akalenyu commented Oct 8, 2025

Just a quick first pass 👍
Do I understand correctly that the host assisted/upload controllers are already okay? It's just the csi/snap clone that's broken?
BTW, we should be able to unit test this (we already use record.NewFakeRecorder in some unit test suites)

The CopyEvents func was originally used to get events from a prime PVC to it's associated "parent" PVC/DV in case something got stuck during the prime PVC reconciliation. A similar issue was reported for clones (specially host clone) where the newly created tmp PVC could encounter an error but no indication of this ever makes it back to it's owner PVC/DV. I figured since each of these clone methods use a tmp PVC that could potentially get stuck during reconcile, we should bubble up these events for each clone type.

Interesting, does this not achieve that for host clone?

cc.AddAnnotation(claim, cc.AnnEventSource, fmt.Sprintf("%s/%s", p.Owner.GetNamespace(), p.Owner.GetName()))

@Dsanatar
Copy link
Collaborator Author

Dsanatar commented Oct 8, 2025

Just a quick first pass 👍
Do I understand correctly that the host assisted/upload controllers are already okay? It's just the csi/snap clone that's broken?
BTW, we should be able to unit test this (we already use record.NewFakeRecorder in some unit test suites)

The CopyEvents func was originally used to get events from a prime PVC to it's associated "parent" PVC/DV in case something got stuck during the prime PVC reconciliation. A similar issue was reported for clones (specially host clone) where the newly created tmp PVC could encounter an error but no indication of this ever makes it back to it's owner PVC/DV. I figured since each of these clone methods use a tmp PVC that could potentially get stuck during reconcile, we should bubble up these events for each clone type.

Interesting, does this not achieve that for host clone?

cc.AddAnnotation(claim, cc.AnnEventSource, fmt.Sprintf("%s/%s", p.Owner.GetNamespace(), p.Owner.GetName()))

I believe this annotation is only used to report specific events from within the controller. The issue seen in the bug report that causes these tmp PVCs to fail is an event failure from the provisioner. I am however using this annotation to get the owner PVC object, so I will need to add this Ann to the snap/csi clone as well.

@Dsanatar
Copy link
Collaborator Author

Just a quick first pass 👍

Do I understand correctly that the host assisted/upload controllers are already okay? It's just the csi/snap clone that's broken?

BTW, we should be able to unit test this (we already use record.NewFakeRecorder in some unit test suites)

After further investigating, it is actually slightly harder to unit test this using the FakeRecorders since the events we are looking for don't get emitted from the normal Reconcile loop, but instead from CopyEvents. This makes it complicated since the function calls client.List()` to get and compare events, which requires the client to be setup with the proper Indexers for the list function to work. I think I ran into this issue when I originally implemented this and opted for a functional test instead

DescribeTable("Events and Conditions from Prime PVC", func(url func() string, containsPrimeEvent bool) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants