Changes from 500 VMs Hybrid work #733

afcollins · 2025-10-31T20:23:53Z

Adds VM recovery playbooks
Enable hugepages on the hypervisor and VM configuration. Add playbook to disable devices created for virtual functions Working changes to configure hugetlb
Complaint of a missing var, this import seems required

Improve CSR approve and node Ready wait loop

Add interfaces that can be used as virtual functions But they seemed to generate a lot of iowait activity on the VMs, so I don't know whether something is wrong with them or not

Some changes were generated using Cursor and the claude-4-sonnet model.

openshift-ci · 2025-10-31T20:24:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign akrzos for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mcornea · 2025-11-04T11:09:59Z

docs/troubleshooting.md

 Object value modified successfully
 ```

+## Failing ImagePull due to Pull Secret


Is this scenario for when the user can no longer update the pull-secret by oc -n openshift-config edit secret/pull-secret ?

Yes. I only encountered it when the credentials installed in the cluster belonged to an account that was deactivated and no nodes would pull images.

I didn't like this section of the Troubleshooting doc to place it under, but wasn't sure there was any better option. Open to suggestions on that too.

mcornea · 2025-11-04T11:22:56Z

ansible/roles/hv-vm-create/templates/kvm-def.xml.j2

  <features>
    <acpi/>
    <apic/>
+    <ioapic driver='qemu'/>


Why is this option needed? Is it required for the virtual functions?

Yes. So it can also be included in a flag switch to configure virtual functions.

mcornea · 2025-11-04T12:24:49Z

ansible/roles/hv-vm-create/templates/kvm-def.xml.j2

 {% endif %}
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
+{% for i in range(1, 6) %}


Can we have an extra option which controls the creation of these interfaces? From what I read in the description there may be some issues with them so I suggest not enabling them by default.

ansible/roles/hv-vm-destroy/tasks/main.yml

ansible/roles/hv-vm-start/tasks/main.yml

mcornea · 2025-11-04T12:37:26Z

ansible/roles/wait-hosts-discovered/tasks/set_hostname_role.yml

        "host_name": "{{ hostname }}",
        "host_role": "{{ host_role }}"
    }
+  ignore_errors: yes


Why do we need ignore_errors: yes here?

I ran into a case where I was getting errors when I couldn't find anything wrong. Can revert it if we would rather not ignore.

mcornea · 2025-11-04T12:42:57Z

ansible/roles/ocp-scale-out-csr/tasks/main.yml

+  shell: |
+    KUBECONFIG={{ bastion_cluster_config_dir }}/kubeconfig oc get nodes --no-headers -l node-role.kubernetes.io/worker | grep -c -v NotReady
+  register: oc_get_nodes_workers
+  until: oc_get_nodes_workers.stdout|int < current_worker_count+scale_out_count


Is this condition correct? Shouldn't it be oc_get_nodes_workers.stdout|int == current_worker_count+scale_out_count ?

Possibly. I thought I remember this working and the prior code specifically not working. But would have to test it more. Could also remove from this PR since I don't have much confidence in it.

ansible/vars/hv.yml

mcornea · 2025-11-04T12:51:59Z

ansible.cfg

@@ -1,3 +1,6 @@
 [defaults]


Do we need these changes?

They improved my quality of life. I was getting python interpreter warnings on every step, so added auto_silent and deprecation_warnings.

Writing log to a file is good to be able to trace back, also if we need a second set of eyes on a playbook run.

I am not sure display_args_to_stdout works properly so that one can go.

@mcornea

Adds VM recovery playbooks Enable hugepages on the hypervisor and VM configuration. Add playbook to disable devices created for virtual functions Working changes to configure hugetlb Complaint of a missing var, this import seems required Improve CSR approve and node Ready wait loop Add interfaces that can be used as virtual functions But they seemed to generate a lot of iowait activity on the VMs, so I don't know whether something is wrong with them or not Some changes were generated using Cursor and the claude-4-sonnet model. Signed-off-by: Andrew Collins <[email protected]> Apply suggestion from @mcornea Co-authored-by: Marius Cornea <[email protected]>

openshift-ci bot requested review from josecastillolema and jtaleric October 31, 2025 20:23

akrzos requested a review from mcornea November 3, 2025 14:36

mcornea reviewed Nov 4, 2025

View reviewed changes

ansible/roles/hv-vm-destroy/tasks/main.yml Outdated Show resolved Hide resolved

mcornea reviewed Nov 4, 2025

View reviewed changes

ansible/roles/hv-vm-start/tasks/main.yml Outdated Show resolved Hide resolved

mcornea reviewed Nov 4, 2025

View reviewed changes

ansible/vars/hv.yml Outdated Show resolved Hide resolved

mcornea reviewed Nov 4, 2025

View reviewed changes

openshift-merge-robot added the needs-rebase label Nov 5, 2025

afcollins force-pushed the hv-hugepages branch from c04998a to 69b8036 Compare November 7, 2025 02:13

openshift-merge-robot removed the needs-rebase label Nov 7, 2025

Changes from 500 VMs Hybrid work #733

Are you sure you want to change the base?

Changes from 500 VMs Hybrid work #733

Uh oh!

Conversation

afcollins commented Oct 31, 2025

Uh oh!

openshift-ci bot commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants