-
Notifications
You must be signed in to change notification settings - Fork 16
Add CI workflows to provision azure VM and run mshv unit tests #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds a reusable Github Actions workflow for setting up Azure vm to run unit tests. This workflow is intended to be called using `workflow_call` from other workflows. It is used to provision the environment before executing tests on it. Signed-off-by: Aastha Rawat <[email protected]>
.github/workflows/mshv-infra.yaml
Outdated
echo "Uploading VHD to Azure Storage Account" | ||
STORAGE_ACCOUNT_PATH=$(echo "$STORAGE_ACCOUNT_PATHS" | jq -r 'to_entries[0].value') | ||
BLOB_URL="${STORAGE_ACCOUNT_PATH}/${GITHUB_RUN_ID}/x86.vhd" | ||
azcopy copy ${SOURCE_PATH_X86} ${BLOB_URL} | ||
echo "VHD upload complete" | ||
|
||
# Create Image from the VHD | ||
echo "Creating Image" | ||
az image create \ | ||
--resource-group ${RG} \ | ||
--name ${IMAGE_NAME} \ | ||
--source $BLOB_URL \ | ||
--os-type Linux \ | ||
--location ${LOCATION} \ | ||
--hyper-v-generation V2 || true | ||
echo "Image creation complete" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to have a image which is created that would be used by this pipeline? Let's have another pipeline which triggers update for this image whenenver a new VHD is released from our internal pipelines. This way we can save one step here and we don't have to perform azcopy all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can have a separate pipeline for VHD preparation.
- VHD Preparation- This will run only when there are changes in the VHD, it will build the image from that VHD and publish it as an artifact or output.
- Infra-setup- This will fetch the latest built image from the first one.
.github/workflows/mshv-infra.yaml
Outdated
echo "Installing AzCopy if not already installed" | ||
if ! command -v azcopy &> /dev/null; then | ||
wget -O azcopy.tar.gz https://aka.ms/downloadazcopy-v10-linux | ||
tar -xvf azcopy.tar.gz | ||
sudo mv azcopy*/azcopy /usr/local/bin/ | ||
sudo chmod +x /usr/local/bin/azcopy | ||
fi | ||
azcopy --version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of re-installing az-copy all the time, it would be better to cache it, something along these lines:
jobs:
azcopy:
runs-on:
- self-hosted
- Linux
steps:
- name: Cache AzCopy
id: cache-azcopy
uses: actions/cache@v4
with:
path: ~/azcopy
key: azcopy-v10
- name: Download AzCopy if not cached
if: steps.cache-azcopy.outputs.cache-hit != 'true'
run: |
wget -O azcopy.tar.gz https://aka.ms/downloadazcopy-v10-linux
mkdir -p ~/azcopy
tar -xvf azcopy.tar.gz -C ~/azcopy --strip-components=1
- name: Add AzCopy to PATH
run: echo "$HOME/azcopy" >> $GITHUB_PATH
- name: Check AzCopy version
run: azcopy --version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nit parametrized the version v10 inside environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound good.
.github/workflows/mshv-infra.yaml
Outdated
RUNNER_RG: ${{ secrets.MSHV_RUNNER_RG }} | ||
RUNNER: ${{ secrets.MSHV_RUNNER }} | ||
SKU: ${{ inputs.SKU }} | ||
VM_NAME: x86_${{ github.run_id }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests works on both ARM64 and x86, so let's start creating VMs on both architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these tests need to be run in dom0 right? We cannot create arm64 dom0 VMs yet though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, those tests are disabled on ARM64 right now, only hypercalls tests are active right now on ARM64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we just need a regular (i.e. not dom0) arm64 VM for these hypercalls tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in order to do hypercalls we would need dom0 /dev/mshv device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can't do that on arm64 because arm64 dom0 stuff is not merged yet. Only x86 for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we already have Dom0 VHD for ARM64 which can support these test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that VHD actually works. It doesn't have my arm64 bring up changes. Also, where will we deploy that VHD? We can't deploy on Azure because there is no nested virt on arm64.
.github/workflows/mshv-infra.yaml
Outdated
USERNAME: ${{ secrets.MSHV_USERNAME }} | ||
PASSWORD: ${{ secrets.MSHV_PASSWORD }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a security stand point it does not make sense to have persistent username and password, instead we can create a ssh-key on the fly and use that for ssh. Something like this:
- name: Generate SSH Key
run: |
mkdir -p ~/.ssh
if [ ! -f ~/.ssh/azure_key ]; then
ssh-keygen -t rsa -b 4096 -f ~/.ssh/azure_key -N ""
fi
- name: Create Azure VM with SSH key auth
run: |
az vm create \
--resource-group $RG \
--name $VM_NAME \
--subnet $SUBNET_ID \
--size $SKU \
--location $LOCATION \
--image vhd-image \
--os-disk-size-gb $OS_DISK_SIZE \
--public-ip-sku Standard \
--storage-sku Premium_LRS \
--public-ip-address "" \
--admin-username $USERNAME \
--ssh-key-values ~/.ssh/azure_key.pub \
--security-type Standard \
--output json
- name: Get VM Private IP
id: get-ip
run: |
PRIVATE_IP=$(az vm show -g $RG -n $VM_NAME -d --query privateIps -o tsv)
if [ -z "$PRIVATE_IP" ]; then
echo "ERROR: Failed to get VM private IP"
exit 1
fi
echo "PRIVATE_IP=$PRIVATE_IP" >> $GITHUB_ENV
- name: Wait for SSH availability
run: |
echo "Sleeping 30s to allow VM boot and SSH service to start..."
sleep 30
- name: Remove Old Host Key
run: |
ssh-keygen -R $PRIVATE_IP
- name: SSH into VM and install dependencies
run: |
ssh -i ~/.ssh/azure_key -o StrictHostKeyChecking=no ${USERNAME}@${PRIVATE_IP} << 'EOF'
set -e
echo "Logged in successfully."
sudo tdnf install -y git clang llvm pkg-config make gcc glibc-devel
git clone https://github.com/rust-vmm/mshv.git
echo "mshv cloned successfully"
EOF
.github/workflows/mshv-tests.yaml
Outdated
@@ -0,0 +1,91 @@ | |||
name: MSHV Build & Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test on linux-musl target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
2fc90c1
to
19668f3
Compare
.github/workflows/mshv-vhd-prep.yaml
Outdated
for LOCATION in "${SUPPORTED_LOCATIONS[@]}"; do | ||
echo "Location: $LOCATION" | ||
STORAGE_ACCOUNT_PATH=$(echo ${STORAGE_ACCOUNT_PATHS} | jq -r ".\"$LOCATION\"") | ||
BLOB_URL="${STORAGE_ACCOUNT_PATH}/$LOCATION/x86_64.vhd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use RUN/JOB id as we discussed. This can override the VHD for parallel run of the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll be using the same image for every run. Scheduled pipeline will build this image from the latest vhd everyday.
.github/workflows/mshv-tests.yaml
Outdated
echo "Logged in successfully." | ||
export PATH="\$HOME/.cargo/bin:\$PATH" | ||
cd mshv | ||
git fetch origin pull/${PR_NUMBER}/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clone the repo here itself? That way we can clone just this branch with git clone -b <branch_name>
. Also use --depth 1
.
RG: MSHV-${{ github.run_id }} | ||
run: | | ||
if az group exists --name ${RG}; then | ||
az group delete --name ${RG} --yes --no-wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can fail right? Do we have another service which periodically goes through all the stale resource groups and cleans them up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean cleanup executes az group delete
but this operation can fail due to multiple reasons, for example auth issue. then that created resource group will not be deleted by pipeline. So we would need an external clean up agent which checks for locked up resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can add a separate cleanup workflow (cron) for this. We can have a separate PR for this.
@@ -0,0 +1,224 @@ | |||
name: MSHV Infra Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one meta question, all these runs are cancellable right? So I update the PR previous run gets cancelled and resource group gets deleted and everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these runs can be cancelled. The cleanup job runs at the end by default even if the workflow is cancelled by the latest PR update. So, if a new commit is pushed, automatically Github will cancel the previous run.
cdea2f1
to
9d4fec7
Compare
.github/workflows/mshv-tests.yaml
Outdated
default: 'main' | ||
jobs: | ||
infra-setup: | ||
name: MSHV Infra Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: MSHV Infra Setup | |
name: MSHV Infra Setup (x86_64) |
Perhaps we should name it like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have a genric name since we'll support other archs in future as well. We already have-
infra-setup: name: ${{ inputs.ARCH }} VM Provision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting that because we're passing ARCH: x86_64
below. How will it be when we introduce a new arch? Won't it be a separate step anyway (which we can name with "arm64" suffix)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it will be separate. We can name it with this convention.
fc67750
to
4ca827f
Compare
Adds a Github Actions workflow to build and run unit tests for the mshv crate. It calls teh infra setup workflow to provision then logs in via SSH and runs the tests. It ensures tests run in a consistent VM setup. Signed-off-by: Aastha Rawat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think we can improve on the cleanup part in a follow up PR. Currently it is getting critical to have some basic tests running in our CI.
@gamora12 Please check the errors that are being thrown by CI run. |
Summary of the PR
This PR introduces two CI workflows to automate the testing of the mshv crate in a virtualized environment:
VM Provisioning Workflow (Reusable)
Build and Test Workflow