What happened: the memory consumption reported by driver and hami is different
What you expected to happen: reported memory consumption on both hami and driver are the same
How to reproduce it (as minimally and precisely as possible):
deploy gpt2 with vllm, each container consume 15k MiB of GPU DRAM
Anything else we need to know?:
- The output of
nvidia-smi -a on vllm container:
- The output of
nvidia-smi -a on host:
Environment:
- HAMi version: 2.6.0
- nvidia driver or other AI device driver version:
- Docker version from
docker version
- Docker command, image and tag used
- Kernel version from
uname -a
- Others: