[Feature] Add energy consumption metrics to benchmark suite

## Feature Request

### Motivation

vLLM's benchmark suite currently tracks throughput and latency, but not energy consumption. As sustainable AI becomes increasingly important, energy-per-token metrics would help users make informed deployment decisions.

### Proposal

Add optional energy consumption tracking to vLLM's benchmark scripts using NVIDIA NVML, reporting:

- Total energy (Joules) per benchmark run
- Energy per output token (J/token)
- Average GPU power draw (W)

### Evidence

Systematic benchmarking across 12 model-precision configurations on NVIDIA RTX 4090D (Ada Lovelace) and RTX 5090 (Blackwell) shows that:

- Quantization does **not** always reduce energy — NF4 increases energy by 25–56% for models below 3B parameters
- Batch size has 84–96% impact on per-token energy, often outweighing precision choice
- INT8 mixed-precision adds 17–33% energy overhead vs FP16
- These effects vary significantly across GPU architectures

### Data

- Full dataset (200+ measurements): [Zenodo](https://zenodo.org/records/18900289)
- Profiling toolkit: [EcoCompute-AI](https://github.com/hongping-zh/ecocompute-ai)
- Interactive dashboard: [https://hongping-zh.github.io/ecocompute-dynamic-eval/](https://hongping-zh.github.io/ecocompute-dynamic-eval/)

### Implementation

I have an open-source NVML-based energy profiling toolkit ([EcoCompute-AI](https://github.com/hongping-zh/ecocompute-ai)) and would be happy to contribute a PR implementing this if there is interest.

The core approach:
- Use `pynvml` to sample GPU power at 10 Hz during benchmark runs
- Compute total energy via trapezoidal integration
- Report energy metrics alongside existing throughput/latency numbers

### Related

- MLPerf Inference Benchmark focuses on throughput/latency only
- CodeCarbon provides system-wide tracking but not per-model GPU-specific metrics
- Related PRs: [huggingface/transformers#44407](https://github.com/huggingface/transformers/pull/44407), [huggingface/optimum#2410](https://github.com/huggingface/optimum/pull/2410)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add energy consumption metrics to benchmark suite #36440

Feature Request

Motivation

Proposal

Evidence

Data

Implementation

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] Add energy consumption metrics to benchmark suite #36440

Description

Feature Request

Motivation

Proposal

Evidence

Data

Implementation

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions