Skip to content

Conversation

@alexnorell
Copy link
Contributor

Problem

Current Jetson 6.2.0 Docker image using l4t-jetpack:r36.4.0 base is 14.2 GB, which:

  • Takes longer to pull and deploy
  • Uses more storage on edge devices
  • Includes unnecessary pre-installed packages that conflict with our requirements
  • Bundles older CUDA (12.2) instead of latest (12.6)

Solution

Use minimal l4t-cuda:12.6.11-runtime base image for final runtime stage while keeping l4t-jetpack only for compilation.

This prototype demonstrates a 2-stage build approach:

  • Stage 1 (Builder): l4t-jetpack:r36.4.0 - Has nvcc, CUDA dev tools, cuDNN, TensorRT for compilation
  • Stage 2 (Runtime): l4t-cuda:12.6.11-runtime - Minimal CUDA runtime determining final image size

Benefits

  • 31.5% smaller: 9.73 GB vs 14.2 GB (4.47 GB savings)
  • Newer CUDA: 12.6 vs 12.2
  • No pre-installed conflicts: Clean dependency management
  • Equivalent performance: RF-DETR 316ms vs 328ms (2.5% faster), YOLOv8 13ms/76 RPS

Testing

E2E Testing on Jetson Orin

  • YOLOv8 benchmark: 13ms latency, 76 RPS (TensorRT acceleration verified)
  • RF-DETR benchmark: 316ms latency, 3.2 RPS (matches/exceeds jetpack performance)
  • ✅ Inference server functional, CLI working
  • ✅ GPU acceleration confirmed (PyTorch CUDA, onnxruntime TensorRT)

Performance Comparison

Metric l4t-jetpack l4t-cuda (prototype) Delta
Image Size 14.2 GB 9.73 GB -31.5%
RF-DETR 328ms 316ms +2.5% faster
YOLOv8 13ms, 76 RPS 13ms, 76 RPS Equal

Files Changed

  • docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0.cuda-base - New prototype Dockerfile
  • docker/BUILD_COMPARISON.md - Detailed comparison documentation

Migration Path

This is a PROTOTYPE for evaluation. See docker/BUILD_COMPARISON.md for full analysis.

Related

Addresses #1695 and ongoing Jetson image size concerns.

alexnorell and others added 30 commits November 7, 2025 09:28
Instead of downgrading rasterio, compile GDAL 3.8.5 from source to meet
rasterio 1.4.0's requirement for GDAL >= 3.5.

Changes:
- Compile GDAL 3.8.5 from source in builder stage
- Copy GDAL libraries and data to runtime stage
- Install required GDAL dependencies
- Set GDAL environment variables (GDAL_CONFIG, GDAL_DATA, LD_LIBRARY_PATH)

This provides a forward-compatible solution while maintaining compatibility
with rasterio 1.4.0 and keeping packages up to date.

Jetpack r36.4.0 ships with GDAL 3.4.1, which is incompatible with
rasterio 1.4.x (requires >= 3.5). Building from source solves this.
Runtime stage only needs the runtime libraries to run GDAL, not the
development headers and static libraries. This reduces image size.

Changed from:
- libproj-dev → libproj25
- libsqlite3-dev → libsqlite3-0
- libtiff-dev → libtiff5
- libcurl4-openssl-dev → libcurl4
- etc.

Builder stage keeps -dev packages (needed for compilation).
Changed GDAL build from Make to Ninja for faster parallel compilation:
- Added -GNinja to cmake to generate Ninja build files
- Use 'ninja' instead of 'make -j$(nproc)'
- Use 'ninja install' instead of 'make install'

Ninja is faster and more efficient for parallel builds.
ninja-build package is already installed in dependencies.
Updated from 3.8.5 to 3.11.5 (latest as of Nov 4, 2025).

Benefits of 3.11.x:
- Latest bug fixes and security updates
- Improved performance
- New format support
- Still meets rasterio 1.4.0 requirement (GDAL >= 3.5)
Changed libproj25 to libproj22 (correct package name for Ubuntu 22.04 Jammy).

Build was failing with:
E: Unable to locate package libproj25
- Bump pylogix from 1.0.5 to 1.1.3 (latest version)
- Add file package to both builder and runtime stages
- file command is required by Arena API for binary architecture validation

This ensures compatibility with the latest pylogix version and enables proper Arena SDK functionality.
The Dockerfile incorrectly specified torch>=2.8.0 which doesn't exist,
causing pip to fall back to CPU-only PyTorch from PyPI instead of using
the GPU-enabled version from jetson-ai-lab.io.

Changed to torch>=2.0.1,<2.7.0 to match requirements.sam.txt and ensure
GPU-enabled PyTorch is installed from the Jetson AI Lab index.

This fixes the critical bug where the container had no PyTorch GPU support.
This fulfills the TODO in requirements.sam.txt to update to PyTorch 2.8.0
now that pre-built flash-attn is available on jetson-ai-lab.io.

Changes:
- PyTorch: >=2.0.1,<2.7.0 → >=2.8.0
- torchvision: >=0.15.2 → >=0.23.0 (latest)
- Added: flash-attn>=2.8.2 for SAM2 support

This enables full GPU acceleration for SAM2 and other transformer models
with flash attention support on Jetson Orin.
Updated requirements.transformers.txt to match requirements.sam.txt:
- torch: >=2.0.1,<2.7.0 → >=2.8.0
- torchvision: >=0.15.0 → >=0.23.0
- Added: flash-attn>=2.8.2

This resolves the dependency conflict that was causing builds to fail.
…icts

- Remove libgdal-dev from apt-get to prevent conflict with compiled GDAL 3.11.5
- Add GDAL version verification to ensure correct version is available
- Pin flash-attn to 2.8.2 to match pre-built wheel on jetson-ai-lab.io

Fixes GDAL version detection issue where rasterio was finding system GDAL 3.4.1 instead of compiled 3.11.5, and flash-attn build failures when uv tried to compile 2.8.3 from source.
PyTorch 2.8.0 from jetson-ai-lab.io was compiled with NumPy 1.x and
crashes when NumPy 2.x is installed. This adds a Jetson-specific
constraint to ensure NumPy 1.26.x is used instead of 2.x.
- Changed _requirements.txt to allow NumPy 1.26+ instead of requiring 2.0+
- requirements.jetson.txt enforces NumPy <2.0 for PyTorch 2.8.0 compatibility
- This allows Jetson builds to use NumPy 1.x while other builds can use 2.x
The numpy package is already specified in _requirements.txt and
requirements.jetson.txt with proper version constraints. Having it
as a standalone argument causes uv to try to install the latest version
(2.x) which conflicts with the Jetson requirement of <2.0.0.
The previous build used cached requirements files with the old NumPy
constraint. This comment forces Docker to invalidate the cache and
copy the updated requirements files with numpy>=1.26.0,<2.3.0.
Changes:
- requirements.sdk.http.txt: Change numpy>=2.0.0,<2.3.0 to numpy>=1.26.0,<2.3.0
- Dockerfile.onnx.jetson.6.2.0: Add ARG CACHE_BUST to force cache invalidation

This resolves the unsatisfiable dependency conflict where PyTorch 2.8.0
from jetson-ai-lab.io requires NumPy 1.x but requirements.sdk.http.txt
specified NumPy 2.x.
This prototype explores using nvcr.io/nvidia/l4t-cuda:12.6.11-runtime as the
base image instead of the full l4t-jetpack:r36.4.0 stack.

Benefits:
- Smaller base image (CUDA runtime vs full JetPack)
- No pre-installed package conflicts
- Full control over dependency versions
- Cleaner dependency management

Comparison:
- Base: l4t-cuda:12.6.11-runtime vs l4t-jetpack:r36.4.0
- CUDA: 12.6.11 vs 12.2
- Same: PyTorch 2.8.0, GDAL 3.11.5, Python dependencies

See docker/BUILD_COMPARISON.md for detailed comparison methodology.
- Added curl to builder stage packages (needed for uv installer)
- Added uv --version verification step to ensure installation succeeds
Changes:
- Added cudnn-source stage to extract cuDNN from l4t-jetpack:r36.4.0
- Copy cuDNN libraries to /usr/local/cuda/lib64/ in runtime stage
- Copy cuDNN headers to /usr/local/cuda/include/
- Update LD_LIBRARY_PATH to include /usr/local/cuda/lib64
- Update label to document cuDNN source

This fixes PyTorch import error (libcudnn.so.9 missing) while maintaining
the 63% size reduction compared to full l4t-jetpack base.
PyTorch requires libcupti.so.12 and libnvToolsExt which are not included
in the l4t-cuda base image. Copy these from the jetpack source stage.
Results:
- Image size: 8.28 GB (41.7% smaller than 14.2 GB jetpack version)
- Build time: ~10 minutes on Jetson Orin MAXN mode
- All components verified working (PyTorch, CUDA, cuDNN, GPU)

Recommendation: Adopt l4t-cuda base for production use.
Critical missing dependency for inference server to work.

Changes:
- Build onnxruntime 1.20.0 from source with CUDA 12.6 and TensorRT
- Copy TensorRT libraries from jetpack to builder and runtime stages
- Use parallel=12 for faster compilation on MAXN mode

This enables full ONNX model support with GPU acceleration.
The onnxruntime build script imports PyTorch which requires libcupti.so.12.
Copy these libs to builder stage and set LD_LIBRARY_PATH before building.
onnxruntime compilation requires nvcc and CUDA development tools which are
only available in the -devel image, not -runtime.
Architecture change:
- Stage 1 (Builder): l4t-jetpack:r36.4.0
  - Has nvcc, CUDA 12.6, cuDNN, TensorRT for compilation
  - Compile GDAL, onnxruntime, install all Python packages

- Stage 2 (Runtime): l4t-cuda:12.6.11-runtime
  - Minimal CUDA runtime (THIS determines final image size)
  - Copy compiled binaries + libraries from builder
  - Copy cuDNN/TensorRT libs from builder

This is cleaner than 3-stage build and matches how existing Dockerfile works.
The gpu_http.py script defines the app but doesn't run uvicorn.
Change entrypoint to match working jetpack version.
uvicorn is installed as a Python module but not in PATH as a standalone
executable.
Build and install inference packages (core, gpu, cli, sdk) to provide
the 'inference' command-line tool for benchmarking.
The inference command is installed in /usr/local/bin by the inference-cli
package and needs to be copied to the runtime image.
The inference CLI script uses #\!/usr/bin/python shebang which requires
a python symlink to python3.
Enable TensorRT FP16, engine caching, OpenBLAS ARM optimization, and
increase concurrent workflow steps to match jetpack configuration.
@PawelPeczek-Roboflow
Copy link
Collaborator

I want to raise a veto for Bundles older CUDA (12.2) instead of latest (12.6)

@alexnorell
Copy link
Contributor Author

Superseded by #1718 which has a cleaner implementation and better benchmark results (41.7% size reduction vs 31.5%).

@alexnorell alexnorell closed this Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants