Skip to content

Conversation

@alexnorell
Copy link
Contributor

Summary

  • Adds detailed logging throughout the TensorRT compilation pipeline to provide better visibility into the compilation process
  • Tracks and reports compilation timing (which can take several minutes)
  • Logs runtime environment, configuration details, and progress for each phase

Changes

engine_builder.py

  • Added ONNX parsing progress logs (start, parsing, completion)
  • Enhanced engine compilation logging with:
    • Visual separators for clear section boundaries
    • Configuration details (precision, input size, batch settings, compatibility flags)
    • Platform capability checks for FP16/INT8 support
    • Optimization profile configuration status
    • Clear "Building TensorRT engine - this may take several minutes..." message before the slow compilation step
    • Build time tracking with elapsed seconds
    • Engine size in MB
    • Completion confirmation

core.py

  • Added high-level compilation orchestration logging:
    • Compilation start message with model directory
    • Runtime environment capture notification
    • GPU availability and device information (CUDA version, TensorRT version)
    • ONNX model loading notification
    • Detection and logging of pre-existing engines (compilation skip)
    • TRT configuration save location
    • Engine builder initialization with workspace size
    • Total compilation time in both seconds and minutes

Example Output

Starting TRT compilation for model in: /path/to/model
Capturing runtime environment information...
GPU Available: True
GPU Devices: ['NVIDIA GeForce RTX 3090']
CUDA Version: 11.8
TensorRT Version: 8.6.1
Loading ONNX model from: /path/to/model/weights.onnx
Starting ONNX parsing from: /path/to/model/weights.onnx
Parsing ONNX model graph...
ONNX parsing completed successfully
============================================================
Starting TensorRT Engine Compilation
============================================================
Output path: /path/to/model/engine-fp16.plan
Precision: FP16
Input size: 640x640
Using static batch size
Configuring builder flags...
FP16 is supported on this platform
Creating optimization profile...
Building TensorRT engine - this may take several minutes...
TensorRT engine built successfully in 123.45 seconds
Engine size: 45.23 MB
============================================================
TensorRT Compilation Complete
============================================================
Total compilation time: 125.67 seconds (2.09 minutes)

Test plan

  • Verify logs appear during TensorRT compilation
  • Confirm timing information is accurate
  • Check that all compilation phases are logged
  • Validate that existing engines are detected and skip compilation is logged

This update adds detailed logging throughout the TensorRT compilation
pipeline to provide better visibility into the compilation process,
which can take several minutes to complete.

Changes:
- Added timing tracking for total compilation time
- Log runtime environment details (GPU, CUDA, TensorRT versions)
- Added progress indicators for each compilation phase
- Log ONNX parsing start and completion
- Display TensorRT engine configuration details (precision, input size,
  batch settings, compatibility flags)
- Show platform capability checks for FP16/INT8 support
- Log the engine building phase with clear "this may take several
  minutes" message
- Report engine build time and final engine size
- Added visual separators for better log readability

The logs now provide users with:
- Real-time feedback on compilation progress
- Time estimates for long-running operations
- System configuration being used
- Clear indication when compilation is skipped (engine exists)

Co-Authored-By: Claude <[email protected]>
@alexnorell alexnorell force-pushed the feat/trt-compilation-logging branch from ce9f42c to 3ca850d Compare November 18, 2025 21:51
@alexnorell alexnorell marked this pull request as draft November 18, 2025 21:56
alexnorell and others added 2 commits November 18, 2025 14:17
This extends the TensorRT compilation logging to the main inference
package (not just inference_experimental). The ONNX Runtime TensorRT
execution provider compiles models on first run, which can take several
minutes without any feedback.

Changes:
- Added INFO-level logging when TensorRT execution provider is detected
- Log TensorRT engine cache path and configuration details
- Log "compilation may occur now" message before the slow InferenceSession
  initialization step
- Report session creation time when using TensorRT
- Added specific logging for RF-DETR model with its advanced TRT config
  (1GB workspace, FP16, etc.)

Files modified:
- inference/core/models/roboflow.py: Base ONNX model class
- inference/models/rfdetr/rfdetr.py: RF-DETR specific TRT configuration

This complements the earlier changes to inference_experimental compilation
and provides visibility into ONNX Runtime's implicit TRT compilation.

Co-Authored-By: Claude <[email protected]>
During ONNX Runtime TensorRT compilation, the process can be silent
for several minutes, making it unclear if progress is being made.
This adds periodic progress logging every 30 seconds.

Implementation:
- Created `_log_progress_periodically()` helper function that runs in
  a background thread
- Logs progress messages every 30 seconds during compilation
- Messages show elapsed time (e.g., "TensorRT compilation still in
  progress... (60s elapsed)")
- Thread is cleanly stopped when session creation completes
- Applied to both base OnnxRoboflowInferenceModel and RF-DETR model

This provides users with reassurance that the system is still working
during the long TensorRT compilation process.

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants