NVIDIA/TensorRT-LLM

54 labels

1.0_doc
AutoDeploy
bug
Something isn't working
Community Engagement
help/insights needed from community
Community want to contribute
PRs initiated from Community
Customized kernels
<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.
Decoding
<NV>Token sampling algorithms in TRTLLM for text gen (top-k, top-p, beam).
dependencies
Pull requests that update a dependency file
Disaggregated serving
<NV>Deploying with separated, distributed components (params, kv-cache, compute). Arch & perf.
Doc
<NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.
duplicate
This issue or pull request already exists
Ease of Use
Items about improving or complaints about TRTLLM ease of use
feature request
New feature or request. This includes new model, dtype, functionality support
Frontend
<NV>Frontend of the LLM workflow
functionality issue
General perf
<NV>Broad performance issues not specific to a particular component
help wanted
Extra attention is needed
Inference runtime
<NV>General operational aspects of TRTLLM execution not in other categories.
Infra
<NV>automated tests, build checks, github actions, system stability & efficiency.
Installation
Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.
Investigating
KV-Cache Management
kv-cache management for efficient LLM inference
LLM API
<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.
Lora/P-tuning
Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.
Low Precision
Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).
Memory
Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.
Merged
Model customization
<NV>Adding support for new model architectures or variants
Model optimization
<NV>Model-specific performance optimizations and tuning
need more info
Further info is required from the requester for devs to help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Labels

54 labels