[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714

alexnorell · 2025-11-13T15:22:27Z

Problem

Current Jetson 6.2.0 Docker image using l4t-jetpack:r36.4.0 base is 14.2 GB, which:

Takes longer to pull and deploy
Uses more storage on edge devices
Includes unnecessary pre-installed packages that conflict with our requirements
Bundles older CUDA (12.2) instead of latest (12.6)

Solution

Use minimal l4t-cuda:12.6.11-runtime base image for final runtime stage while keeping l4t-jetpack only for compilation.

This prototype demonstrates a 2-stage build approach:

Stage 1 (Builder): l4t-jetpack:r36.4.0 - Has nvcc, CUDA dev tools, cuDNN, TensorRT for compilation
Stage 2 (Runtime): l4t-cuda:12.6.11-runtime - Minimal CUDA runtime determining final image size

Benefits

✅ 31.5% smaller: 9.73 GB vs 14.2 GB (4.47 GB savings)
✅ Newer CUDA: 12.6 vs 12.2
✅ No pre-installed conflicts: Clean dependency management
✅ Equivalent performance: RF-DETR 316ms vs 328ms (2.5% faster), YOLOv8 13ms/76 RPS

Testing

E2E Testing on Jetson Orin

✅ YOLOv8 benchmark: 13ms latency, 76 RPS (TensorRT acceleration verified)
✅ RF-DETR benchmark: 316ms latency, 3.2 RPS (matches/exceeds jetpack performance)
✅ Inference server functional, CLI working
✅ GPU acceleration confirmed (PyTorch CUDA, onnxruntime TensorRT)

Performance Comparison

Metric	l4t-jetpack	l4t-cuda (prototype)	Delta
Image Size	14.2 GB	9.73 GB	-31.5%
RF-DETR	328ms	316ms	+2.5% faster
YOLOv8	13ms, 76 RPS	13ms, 76 RPS	Equal

Files Changed

docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0.cuda-base - New prototype Dockerfile
docker/BUILD_COMPARISON.md - Detailed comparison documentation

Migration Path

This is a PROTOTYPE for evaluation. See docker/BUILD_COMPARISON.md for full analysis.

Instead of downgrading rasterio, compile GDAL 3.8.5 from source to meet rasterio 1.4.0's requirement for GDAL >= 3.5. Changes: - Compile GDAL 3.8.5 from source in builder stage - Copy GDAL libraries and data to runtime stage - Install required GDAL dependencies - Set GDAL environment variables (GDAL_CONFIG, GDAL_DATA, LD_LIBRARY_PATH) This provides a forward-compatible solution while maintaining compatibility with rasterio 1.4.0 and keeping packages up to date. Jetpack r36.4.0 ships with GDAL 3.4.1, which is incompatible with rasterio 1.4.x (requires >= 3.5). Building from source solves this.

Runtime stage only needs the runtime libraries to run GDAL, not the development headers and static libraries. This reduces image size. Changed from: - libproj-dev → libproj25 - libsqlite3-dev → libsqlite3-0 - libtiff-dev → libtiff5 - libcurl4-openssl-dev → libcurl4 - etc. Builder stage keeps -dev packages (needed for compilation).

Changed GDAL build from Make to Ninja for faster parallel compilation: - Added -GNinja to cmake to generate Ninja build files - Use 'ninja' instead of 'make -j$(nproc)' - Use 'ninja install' instead of 'make install' Ninja is faster and more efficient for parallel builds. ninja-build package is already installed in dependencies.

Updated from 3.8.5 to 3.11.5 (latest as of Nov 4, 2025). Benefits of 3.11.x: - Latest bug fixes and security updates - Improved performance - New format support - Still meets rasterio 1.4.0 requirement (GDAL >= 3.5)

Changed libproj25 to libproj22 (correct package name for Ubuntu 22.04 Jammy). Build was failing with: E: Unable to locate package libproj25

- Bump pylogix from 1.0.5 to 1.1.3 (latest version) - Add file package to both builder and runtime stages - file command is required by Arena API for binary architecture validation This ensures compatibility with the latest pylogix version and enables proper Arena SDK functionality.

The Dockerfile incorrectly specified torch>=2.8.0 which doesn't exist, causing pip to fall back to CPU-only PyTorch from PyPI instead of using the GPU-enabled version from jetson-ai-lab.io. Changed to torch>=2.0.1,<2.7.0 to match requirements.sam.txt and ensure GPU-enabled PyTorch is installed from the Jetson AI Lab index. This fixes the critical bug where the container had no PyTorch GPU support.

This fulfills the TODO in requirements.sam.txt to update to PyTorch 2.8.0 now that pre-built flash-attn is available on jetson-ai-lab.io. Changes: - PyTorch: >=2.0.1,<2.7.0 → >=2.8.0 - torchvision: >=0.15.2 → >=0.23.0 (latest) - Added: flash-attn>=2.8.2 for SAM2 support This enables full GPU acceleration for SAM2 and other transformer models with flash attention support on Jetson Orin.

Updated requirements.transformers.txt to match requirements.sam.txt: - torch: >=2.0.1,<2.7.0 → >=2.8.0 - torchvision: >=0.15.0 → >=0.23.0 - Added: flash-attn>=2.8.2 This resolves the dependency conflict that was causing builds to fail.

…icts - Remove libgdal-dev from apt-get to prevent conflict with compiled GDAL 3.11.5 - Add GDAL version verification to ensure correct version is available - Pin flash-attn to 2.8.2 to match pre-built wheel on jetson-ai-lab.io Fixes GDAL version detection issue where rasterio was finding system GDAL 3.4.1 instead of compiled 3.11.5, and flash-attn build failures when uv tried to compile 2.8.3 from source.

PyTorch 2.8.0 from jetson-ai-lab.io was compiled with NumPy 1.x and crashes when NumPy 2.x is installed. This adds a Jetson-specific constraint to ensure NumPy 1.26.x is used instead of 2.x.

- Changed _requirements.txt to allow NumPy 1.26+ instead of requiring 2.0+ - requirements.jetson.txt enforces NumPy <2.0 for PyTorch 2.8.0 compatibility - This allows Jetson builds to use NumPy 1.x while other builds can use 2.x

The numpy package is already specified in _requirements.txt and requirements.jetson.txt with proper version constraints. Having it as a standalone argument causes uv to try to install the latest version (2.x) which conflicts with the Jetson requirement of <2.0.0.

The previous build used cached requirements files with the old NumPy constraint. This comment forces Docker to invalidate the cache and copy the updated requirements files with numpy>=1.26.0,<2.3.0.

Changes: - requirements.sdk.http.txt: Change numpy>=2.0.0,<2.3.0 to numpy>=1.26.0,<2.3.0 - Dockerfile.onnx.jetson.6.2.0: Add ARG CACHE_BUST to force cache invalidation This resolves the unsatisfiable dependency conflict where PyTorch 2.8.0 from jetson-ai-lab.io requires NumPy 1.x but requirements.sdk.http.txt specified NumPy 2.x.

This prototype explores using nvcr.io/nvidia/l4t-cuda:12.6.11-runtime as the base image instead of the full l4t-jetpack:r36.4.0 stack. Benefits: - Smaller base image (CUDA runtime vs full JetPack) - No pre-installed package conflicts - Full control over dependency versions - Cleaner dependency management Comparison: - Base: l4t-cuda:12.6.11-runtime vs l4t-jetpack:r36.4.0 - CUDA: 12.6.11 vs 12.2 - Same: PyTorch 2.8.0, GDAL 3.11.5, Python dependencies See docker/BUILD_COMPARISON.md for detailed comparison methodology.

- Added curl to builder stage packages (needed for uv installer) - Added uv --version verification step to ensure installation succeeds

Changes: - Added cudnn-source stage to extract cuDNN from l4t-jetpack:r36.4.0 - Copy cuDNN libraries to /usr/local/cuda/lib64/ in runtime stage - Copy cuDNN headers to /usr/local/cuda/include/ - Update LD_LIBRARY_PATH to include /usr/local/cuda/lib64 - Update label to document cuDNN source This fixes PyTorch import error (libcudnn.so.9 missing) while maintaining the 63% size reduction compared to full l4t-jetpack base.

PyTorch requires libcupti.so.12 and libnvToolsExt which are not included in the l4t-cuda base image. Copy these from the jetpack source stage.

Results: - Image size: 8.28 GB (41.7% smaller than 14.2 GB jetpack version) - Build time: ~10 minutes on Jetson Orin MAXN mode - All components verified working (PyTorch, CUDA, cuDNN, GPU) Recommendation: Adopt l4t-cuda base for production use.

Critical missing dependency for inference server to work. Changes: - Build onnxruntime 1.20.0 from source with CUDA 12.6 and TensorRT - Copy TensorRT libraries from jetpack to builder and runtime stages - Use parallel=12 for faster compilation on MAXN mode This enables full ONNX model support with GPU acceleration.

The onnxruntime build script imports PyTorch which requires libcupti.so.12. Copy these libs to builder stage and set LD_LIBRARY_PATH before building.

onnxruntime compilation requires nvcc and CUDA development tools which are only available in the -devel image, not -runtime.

Architecture change: - Stage 1 (Builder): l4t-jetpack:r36.4.0 - Has nvcc, CUDA 12.6, cuDNN, TensorRT for compilation - Compile GDAL, onnxruntime, install all Python packages - Stage 2 (Runtime): l4t-cuda:12.6.11-runtime - Minimal CUDA runtime (THIS determines final image size) - Copy compiled binaries + libraries from builder - Copy cuDNN/TensorRT libs from builder This is cleaner than 3-stage build and matches how existing Dockerfile works.

The gpu_http.py script defines the app but doesn't run uvicorn. Change entrypoint to match working jetpack version.

uvicorn is installed as a Python module but not in PATH as a standalone executable.

Build and install inference packages (core, gpu, cli, sdk) to provide the 'inference' command-line tool for benchmarking.

The inference command is installed in /usr/local/bin by the inference-cli package and needs to be copied to the runtime image.

The inference CLI script uses #\!/usr/bin/python shebang which requires a python symlink to python3.

Enable TensorRT FP16, engine caching, OpenBLAS ARM optimization, and increase concurrent workflow steps to match jetpack configuration.

PawelPeczek-Roboflow · 2025-11-14T08:55:43Z

I want to raise a veto for Bundles older CUDA (12.2) instead of latest (12.6)

alexnorell · 2025-11-14T14:54:41Z

Superseded by #1718 which has a cleaner implementation and better benchmark results (41.7% size reduction vs 31.5%).

alexnorell and others added 30 commits November 7, 2025 09:28

Merge branch 'main' into fix/jetson-620-rasterio

2381e04

Update GDAL to latest version 3.11.5

7f19eea

Updated from 3.8.5 to 3.11.5 (latest as of Nov 4, 2025). Benefits of 3.11.x: - Latest bug fixes and security updates - Improved performance - New format support - Still meets rasterio 1.4.0 requirement (GDAL >= 3.5)

Fix libproj package name for Ubuntu 22.04

b6c5f05

Changed libproj25 to libproj22 (correct package name for Ubuntu 22.04 Jammy). Build was failing with: E: Unable to locate package libproj25

Add libopenblas0 for PyTorch dependency in ONNXRuntime build

fbe1e14

Add libopenblas0 to runtime stage for PyTorch support

f92ff0e

Add NumPy <2.0 constraint for PyTorch 2.8.0 compatibility on Jetson

1a05fcd

PyTorch 2.8.0 from jetson-ai-lab.io was compiled with NumPy 1.x and crashes when NumPy 2.x is installed. This adds a Jetson-specific constraint to ensure NumPy 1.26.x is used instead of 2.x.

Fix NumPy version compatibility for Jetson PyTorch 2.8.0

dff3055

- Changed _requirements.txt to allow NumPy 1.26+ instead of requiring 2.0+ - requirements.jetson.txt enforces NumPy <2.0 for PyTorch 2.8.0 compatibility - This allows Jetson builds to use NumPy 1.x while other builds can use 2.x

Add cache-busting comment to ensure NumPy fix is applied

2848136

The previous build used cached requirements files with the old NumPy constraint. This comment forces Docker to invalidate the cache and copy the updated requirements files with numpy>=1.26.0,<2.3.0.

Fix: Add curl package and verify uv installation

e10eddb

- Added curl to builder stage packages (needed for uv installer) - Added uv --version verification step to ensure installation succeeds

Add CUDA profiling libraries (libcupti, libnvToolsExt)

0fab1a0

PyTorch requires libcupti.so.12 and libnvToolsExt which are not included in the l4t-cuda base image. Copy these from the jetpack source stage.

Fix: Add CUDA profiling libs to builder stage for onnxruntime build

aa76821

The onnxruntime build script imports PyTorch which requires libcupti.so.12. Copy these libs to builder stage and set LD_LIBRARY_PATH before building.

Fix: Use l4t-cuda:12.6.11-devel for builder stage

2c9fbc7

onnxruntime compilation requires nvcc and CUDA development tools which are only available in the -devel image, not -runtime.

Fix ENTRYPOINT to use uvicorn instead of direct python

6926136

The gpu_http.py script defines the app but doesn't run uvicorn. Change entrypoint to match working jetpack version.

Fix: Use python3 -m uvicorn instead of bare uvicorn command

ce06d00

uvicorn is installed as a Python module but not in PATH as a standalone executable.

Add inference CLI wheel building and installation

fd1f748

Build and install inference packages (core, gpu, cli, sdk) to provide the 'inference' command-line tool for benchmarking.

alexnorell added 3 commits November 13, 2025 02:54

Copy inference CLI executable to runtime stage

8470692

The inference command is installed in /usr/local/bin by the inference-cli package and needs to be copied to the runtime image.

Add python symlink for inference CLI script compatibility

390b62a

The inference CLI script uses #\!/usr/bin/python shebang which requires a python symlink to python3.

Add TensorRT and performance environment variables

6008ae0

Enable TensorRT FP16, engine caching, OpenBLAS ARM optimization, and increase concurrent workflow steps to match jetpack configuration.

alexnorell requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners November 13, 2025 15:22

alexnorell closed this Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714

[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714

Uh oh!

alexnorell commented Nov 13, 2025

Uh oh!

PawelPeczek-Roboflow commented Nov 14, 2025

Uh oh!

alexnorell commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714

[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714

Uh oh!

Conversation

alexnorell commented Nov 13, 2025

Problem

Solution

Benefits

Testing

E2E Testing on Jetson Orin

Performance Comparison

Files Changed

Migration Path

Related

Uh oh!

PawelPeczek-Roboflow commented Nov 14, 2025

Uh oh!

alexnorell commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants