Labels
Labels
54 labels
- Something isn't working
- help/insights needed from community
- PRs initiated from Community
- <NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.
- <NV>Token sampling algorithms in TRTLLM for text gen (top-k, top-p, beam).
- Pull requests that update a dependency file
- <NV>Deploying with separated, distributed components (params, kv-cache, compute). Arch & perf.
- <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.
- This issue or pull request already exists
- Items about improving or complaints about TRTLLM ease of use
- New feature or request. This includes new model, dtype, functionality support
- <NV>Frontend of the LLM workflow
- <NV>Broad performance issues not specific to a particular component
- Extra attention is needed
- <NV>General operational aspects of TRTLLM execution not in other categories.
- <NV>automated tests, build checks, github actions, system stability & efficiency.
- Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.
- kv-cache management for efficient LLM inference
- <NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.
- Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.
- Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).
- Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.
- <NV>Adding support for new model architectures or variants
- <NV>Model-specific performance optimizations and tuning
- Further info is required from the requester for devs to help