-
Notifications
You must be signed in to change notification settings - Fork 21
Add aks deployment #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
vofish
commented
Aug 22, 2025
- Added AKS support for Gemma-2b
- Selected NVIDIA T4 GPU for deployment, as L4 GPUs are not available in Azure.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vofish The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
FYI, the output of |
| template: | ||
| spec: | ||
| nodeSelector: | ||
| kubernetes.azure.com/accelerator: nvidia | ||
| containers: | ||
| - name: inference-server | ||
| resources: | ||
| requests: | ||
| nvidia.com/gpu: 1 | ||
| limits: | ||
| nvidia.com/gpu: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unclear on how this GPU patch restricts the deployment to only use T4-type GPU instances. Looking at the t4.yaml file, the nodeSelector is set to kubernetes.azure.com/accelerator: nvidia. This seems to select any node with an NVIDIA GPU, rather than specifically targeting T4 instances.
Could you clarify how the T4 type is enforced?
Also, have you tested this on AKS? If so, did you use a manual node pool (perhaps one provisioned with only T4 instances) or an automatic node pool? If it was an automatic node pool, how is this resource request bound specifically to T4 GPUs?