Skip to content

Labels

Labels

  • Something isn't working
  • help/insights needed from community
  • PRs initiated from Community
  • <NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.
  • <NV>Token sampling algorithms in TRTLLM for text gen (top-k, top-p, beam).
  • Pull requests that update a dependency file
  • <NV>Deploying with separated, distributed components (params, kv-cache, compute). Arch & perf.
  • <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.
  • This issue or pull request already exists
  • Items about improving or complaints about TRTLLM ease of use
  • New feature or request. This includes new model, dtype, functionality support
  • <NV>Frontend of the LLM workflow
  • <NV>Broad performance issues not specific to a particular component
  • Extra attention is needed
  • <NV>General operational aspects of TRTLLM execution not in other categories.
  • <NV>automated tests, build checks, github actions, system stability & efficiency.
  • Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.
  • kv-cache management for efficient LLM inference
  • <NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.
  • Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.
  • Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).
  • Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.
  • <NV>Adding support for new model architectures or variants
  • <NV>Model-specific performance optimizations and tuning
  • Further info is required from the requester for devs to help