Skip to content

Different between memory usage reported by driver and inside container #1196

@thien-lm

Description

@thien-lm

What happened: the memory consumption reported by driver and hami is different

What you expected to happen: reported memory consumption on both hami and driver are the same

How to reproduce it (as minimally and precisely as possible):
deploy gpt2 with vllm, each container consume 15k MiB of GPU DRAM
Anything else we need to know?:

  • The output of nvidia-smi -a on vllm container:
Image
  • The output of nvidia-smi -a on host:
Image

Environment:

  • HAMi version: 2.6.0
  • nvidia driver or other AI device driver version:
  • Docker version from docker version
  • Docker command, image and tag used
  • Kernel version from uname -a
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions