Skip to content

Add CI workflows to provision azure VM and run mshv unit tests #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gamora12
Copy link

@gamora12 gamora12 commented Apr 21, 2025

Summary of the PR

This PR introduces two CI workflows to automate the testing of the mshv crate in a virtualized environment:

  1. VM Provisioning Workflow (Reusable)

    • Provisions an Azure VM using managed identity
    • Sets VM name dynamically based on GitHub run ID
    • Prepares the environment for running tests
  2. Build and Test Workflow

    • Triggers the provisioning workflow
    • Builds the mshv crate & runs unit tests via SSH on the provisioned remote VM

Adds a reusable Github Actions workflow for setting up Azure vm
to run unit tests. This workflow is intended to be called using
`workflow_call` from other workflows. It is used to provision the
environment before executing tests on it.

Signed-off-by: Aastha Rawat <[email protected]>
Comment on lines 97 to 112
echo "Uploading VHD to Azure Storage Account"
STORAGE_ACCOUNT_PATH=$(echo "$STORAGE_ACCOUNT_PATHS" | jq -r 'to_entries[0].value')
BLOB_URL="${STORAGE_ACCOUNT_PATH}/${GITHUB_RUN_ID}/x86.vhd"
azcopy copy ${SOURCE_PATH_X86} ${BLOB_URL}
echo "VHD upload complete"

# Create Image from the VHD
echo "Creating Image"
az image create \
--resource-group ${RG} \
--name ${IMAGE_NAME} \
--source $BLOB_URL \
--os-type Linux \
--location ${LOCATION} \
--hyper-v-generation V2 || true
echo "Image creation complete"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to have a image which is created that would be used by this pipeline? Let's have another pipeline which triggers update for this image whenenver a new VHD is released from our internal pipelines. This way we can save one step here and we don't have to perform azcopy all the time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can have a separate pipeline for VHD preparation.

  1. VHD Preparation- This will run only when there are changes in the VHD, it will build the image from that VHD and publish it as an artifact or output.
  2. Infra-setup- This will fetch the latest built image from the first one.

Comment on lines 85 to 92
echo "Installing AzCopy if not already installed"
if ! command -v azcopy &> /dev/null; then
wget -O azcopy.tar.gz https://aka.ms/downloadazcopy-v10-linux
tar -xvf azcopy.tar.gz
sudo mv azcopy*/azcopy /usr/local/bin/
sudo chmod +x /usr/local/bin/azcopy
fi
azcopy --version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of re-installing az-copy all the time, it would be better to cache it, something along these lines:

jobs:
  azcopy:
    runs-on:
       - self-hosted
       - Linux

    steps:
      - name: Cache AzCopy
        id: cache-azcopy
        uses: actions/cache@v4
        with:
          path: ~/azcopy
          key: azcopy-v10

      - name: Download AzCopy if not cached
        if: steps.cache-azcopy.outputs.cache-hit != 'true'
        run: |
          wget -O azcopy.tar.gz https://aka.ms/downloadazcopy-v10-linux
          mkdir -p ~/azcopy
          tar -xvf azcopy.tar.gz -C ~/azcopy --strip-components=1

      - name: Add AzCopy to PATH
        run: echo "$HOME/azcopy" >> $GITHUB_PATH

      - name: Check AzCopy version
        run: azcopy --version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit parametrized the version v10 inside environment variable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good.

RUNNER_RG: ${{ secrets.MSHV_RUNNER_RG }}
RUNNER: ${{ secrets.MSHV_RUNNER }}
SKU: ${{ inputs.SKU }}
VM_NAME: x86_${{ github.run_id }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests works on both ARM64 and x86, so let's start creating VMs on both architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these tests need to be run in dom0 right? We cannot create arm64 dom0 VMs yet though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, those tests are disabled on ARM64 right now, only hypercalls tests are active right now on ARM64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we just need a regular (i.e. not dom0) arm64 VM for these hypercalls tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in order to do hypercalls we would need dom0 /dev/mshv device.

Copy link
Contributor

@anirudhrb anirudhrb Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can't do that on arm64 because arm64 dom0 stuff is not merged yet. Only x86 for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we already have Dom0 VHD for ARM64 which can support these test cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that VHD actually works. It doesn't have my arm64 bring up changes. Also, where will we deploy that VHD? We can't deploy on Azure because there is no nested virt on arm64.

Comment on lines 123 to 124
USERNAME: ${{ secrets.MSHV_USERNAME }}
PASSWORD: ${{ secrets.MSHV_PASSWORD }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a security stand point it does not make sense to have persistent username and password, instead we can create a ssh-key on the fly and use that for ssh. Something like this:

      - name: Generate SSH Key
        run: |
          mkdir -p ~/.ssh
          if [ ! -f ~/.ssh/azure_key ]; then
            ssh-keygen -t rsa -b 4096 -f ~/.ssh/azure_key -N ""
          fi

      - name: Create Azure VM with SSH key auth
        run: |
          az vm create \
            --resource-group $RG \
            --name $VM_NAME \
            --subnet $SUBNET_ID \
            --size $SKU \
            --location $LOCATION \
            --image vhd-image \
            --os-disk-size-gb $OS_DISK_SIZE \
            --public-ip-sku Standard \
            --storage-sku Premium_LRS \
            --public-ip-address "" \
            --admin-username $USERNAME \
            --ssh-key-values ~/.ssh/azure_key.pub \
            --security-type Standard \
            --output json

      - name: Get VM Private IP
        id: get-ip
        run: |
          PRIVATE_IP=$(az vm show -g $RG -n $VM_NAME -d --query privateIps -o tsv)
          if [ -z "$PRIVATE_IP" ]; then
            echo "ERROR: Failed to get VM private IP"
            exit 1
          fi
          echo "PRIVATE_IP=$PRIVATE_IP" >> $GITHUB_ENV

      - name: Wait for SSH availability
        run: |
          echo "Sleeping 30s to allow VM boot and SSH service to start..."
          sleep 30

      - name: Remove Old Host Key
        run: |
          ssh-keygen -R $PRIVATE_IP

      - name: SSH into VM and install dependencies
        run: |
          ssh -i ~/.ssh/azure_key -o StrictHostKeyChecking=no ${USERNAME}@${PRIVATE_IP} << 'EOF'
            set -e
            echo "Logged in successfully."
            sudo tdnf install -y git clang llvm pkg-config make gcc glibc-devel
            git clone https://github.com/rust-vmm/mshv.git
            echo "mshv cloned successfully"
          EOF

@@ -0,0 +1,91 @@
name: MSHV Build & Tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test on linux-musl target

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

for LOCATION in "${SUPPORTED_LOCATIONS[@]}"; do
echo "Location: $LOCATION"
STORAGE_ACCOUNT_PATH=$(echo ${STORAGE_ACCOUNT_PATHS} | jq -r ".\"$LOCATION\"")
BLOB_URL="${STORAGE_ACCOUNT_PATH}/$LOCATION/x86_64.vhd"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use RUN/JOB id as we discussed. This can override the VHD for parallel run of the flow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll be using the same image for every run. Scheduled pipeline will build this image from the latest vhd everyday.

echo "Logged in successfully."
export PATH="\$HOME/.cargo/bin:\$PATH"
cd mshv
git fetch origin pull/${PR_NUMBER}/merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clone the repo here itself? That way we can clone just this branch with git clone -b <branch_name>. Also use --depth 1.

RG: MSHV-${{ github.run_id }}
run: |
if az group exists --name ${RG}; then
az group delete --name ${RG} --yes --no-wait
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fail right? Do we have another service which periodically goes through all the stale resource groups and cleans them up?

Copy link
Author

@gamora12 gamora12 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup job will delete the resource group at the end for each run. It will always run even if the previous jobs fail or get cancelled.
image

As of now we don't have a separate service to ensure the cleanup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean cleanup executes az group delete but this operation can fail due to multiple reasons, for example auth issue. then that created resource group will not be deleted by pipeline. So we would need an external clean up agent which checks for locked up resources.

Copy link
Author

@gamora12 gamora12 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can add a separate cleanup workflow (cron) for this. We can have a separate PR for this.

@@ -0,0 +1,224 @@
name: MSHV Infra Setup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one meta question, all these runs are cancellable right? So I update the PR previous run gets cancelled and resource group gets deleted and everything.

Copy link
Author

@gamora12 gamora12 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these runs can be cancelled. The cleanup job runs at the end by default even if the workflow is cancelled by the latest PR update. So, if a new commit is pushed, automatically Github will cancel the previous run.

@gamora12 gamora12 force-pushed the aastha/mshv_ci branch 2 times, most recently from cdea2f1 to 9d4fec7 Compare June 12, 2025 08:39
default: 'main'
jobs:
infra-setup:
name: MSHV Infra Setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: MSHV Infra Setup
name: MSHV Infra Setup (x86_64)

Perhaps we should name it like this?

Copy link
Author

@gamora12 gamora12 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have a genric name since we'll support other archs in future as well. We already have-
infra-setup: name: ${{ inputs.ARCH }} VM Provision

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting that because we're passing ARCH: x86_64 below. How will it be when we introduce a new arch? Won't it be a separate step anyway (which we can name with "arm64" suffix)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will be separate. We can name it with this convention.

@gamora12 gamora12 force-pushed the aastha/mshv_ci branch 2 times, most recently from fc67750 to 4ca827f Compare June 13, 2025 04:42
Adds a Github Actions workflow to build and run unit tests for
the mshv crate. It calls teh infra setup workflow to provision
then logs in via SSH and runs the tests. It ensures tests run in
a consistent VM setup.

Signed-off-by: Aastha Rawat <[email protected]>
Copy link
Collaborator

@jinankjain jinankjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can improve on the cleanup part in a follow up PR. Currently it is getting critical to have some basic tests running in our CI.

@jinankjain
Copy link
Collaborator

@gamora12 Please check the errors that are being thrown by CI run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants