Skip to content

AI: -nvidia "all" is not obvious what it does #3370

@Titan-Node

Description

@Titan-Node

Describe the bug
When the flag -nvidia "all" is used for AI inference, the aiModels.json must match number of GPUs installed, else it will try run all GPUs with the first item in the list.

Not sure if this is the expected behavior.

To Reproduce

  1. Install 2 GPUs in a machine
  2. Set -nvidia "all"
  3. Set your aiModels.json to include only the LLM model
[
    {
    "pipeline": "llm",
    "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "price_per_unit": 80000000,
    "pixels_per_unit": 1000000,
    "warm": true
    }
]
  1. If the second GPU has less than 24GB, it will fail to launch to container and time out.

Expected behavior
Obviously if we specify the GPUs ie -nvidia 0,1 then we assume the aiModels.json will have two models loaded, it works.
i.e.

[
    {
    "pipeline": "llm",
    "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "price_per_unit": 80000000,
    "pixels_per_unit": 1000000,
    "warm": true
    },
    {
      "pipeline": "text-to-image",
      "model_id": "ByteDance/SDXL-Lightning",
      "price_per_unit": 4768371,
      "warm": true
    }
]

I think the documentation needs to be updated, but I am unsure if this is actually the logic for the "all" value.

Set Up
Slot 0 - 3090 (24GB Ram)
Slot 1 - 2080 ti (11GB Ram)

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: triagethis issue has not been evaluated yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions