Optimize Jetson 6.2.0 Docker image with l4t-cuda base (41.7% size reduction) #1718

alexnorell · 2025-11-14T14:50:36Z

Description

Optimizes the Jetson 6.2.0 Docker image by switching from the full l4t-jetpack base (~14 GB) to the minimal l4t-cuda:12.6.11-runtime base (~8 GB), achieving a ~40% size reduction while improving CUDA version and maintaining full functionality.

Key Improvements

Image Optimization:

41.7% smaller: 14.2 GB → 8.28 GB (5.92 GB savings)
l4t-jetpack → l4t-cuda: Eliminates unnecessary JetPack SDK components (VPI, multimedia APIs, GStreamer)
CUDA 12.6.11: Upgraded from 12.2 (matches JetPack 6.2 official version)
2-stage build: JetPack builder for compilation tools + minimal CUDA runtime for deployment

Software Stack:

onnxruntime-gpu 1.20.0 (compiled with CUDA 12.6 + TensorRT support)
PyTorch 2.8.0 from jetson-ai-lab.io
NumPy 1.26.4 (Jetson PyTorch compatibility)
CMake 3.31.10 (parameterized build arg)
GDAL 3.11.5 (compiled from source)
cuDNN 9.3 + TensorRT with FP16 acceleration

Performance:

TensorRT execution provider enabled by default
FP16 precision for faster inference
Engine caching for instant subsequent runs

Benchmark Results

RF-DETR Base on Jetson AGX Orin with TensorRT:

62.2 FPS @ 16.0ms average latency
0% error rate (1000/1000 successful inferences)
±1.1ms standard deviation (very consistent)
Percentiles: P50=16.3ms, P75=16.6ms, P90=18.3ms, P99=18.6ms

Test config: rfdetr-base (29M params), COCO dataset, batch_size=1, 560x560 input, TensorRT FP16

Command:

inference benchmark python-package-speed -m rfdetr-base -d coco -bi 1000

Technical Details

Why l4t-cuda instead of l4t-jetpack:

l4t-jetpack (14.2 GB): Full JetPack SDK including VPI, multimedia codecs, GStreamer, samples, and development tools
l4t-cuda (8.28 GB final): Just CUDA runtime + extracted essentials (cuDNN, TensorRT libs) from JetPack
Result: Faster downloads, less storage, cleaner dependency management, newer CUDA

Multi-stage build:

Builder uses l4t-jetpack:r36.4.0 for compilation (CUDA dev tools, nvcc)
Runtime uses l4t-cuda:12.6.11-runtime with only necessary libs copied from builder
Extracts cuDNN 9.3 and TensorRT from JetPack for PyTorch compatibility

Dependency Management:
Created 5 Jetson-specific requirements files to avoid numpy/torch version conflicts:

_requirements.jetson.txt - Core deps without numpy
requirements.jetson.6.2.0.txt - Platform deps with numpy<2.0.0
requirements.transformers.jetson.txt - Transformers without torch
requirements.sam.jetson.txt - SAM without torch
requirements.sdk.http.jetson.txt - SDK without numpy

Why numpy<2.0.0: Jetson PyTorch 2.8.0 wheels compiled against numpy 1.x C-API (numpy 2.0 broke ABI compatibility 17 months ago, Jetson hasn't updated yet).

Type of change

Performance improvement (reduces image size, faster inference)
This change modifies the Jetson 6.2.0 Dockerfile

How has this change been tested?

Build: Successfully built on Jetson AGX Orin (~40 min full build)
Runtime: Container runs successfully, all imports working, GPU acceleration active
Benchmark: RF-DETR 62.2 FPS with TensorRT verified on Jetson AGX Orin

Deployment considerations

First run: 15+ min for TensorRT engine compilation (cached thereafter)
Use --volume ~/.inference/cache:/tmp:rw to persist TensorRT cache
MAXN mode recommended for best performance
numpy<2.0.0 required for Jetson PyTorch 2.8.0 compatibility

Docs

N/A

…uction) Replace full l4t-jetpack base image with lighter l4t-cuda:12.6.11-runtime for Jetson 6.2.0 inference server deployment. This optimization reduces image size from 14.2 GB to 8.28 GB (41.7% reduction) while maintaining full functionality and improving CUDA version to 12.6.11. Key improvements: - New Dockerfile using l4t-cuda:12.6.11-runtime as base - Multi-stage build: JetPack builder + minimal CUDA runtime - Compiled onnxruntime-gpu with CUDA 12.6 and TensorRT support - GDAL 3.11.5 compiled from source with Ninja build system - PyTorch 2.8.0 with CUDA 12.6 support from jetson-ai-lab.io - TensorRT FP16 acceleration enabled by default - Python symlink for inference CLI compatibility Performance: - RF-DETR Base benchmark: 27.2 FPS @ 36.8ms avg latency - TensorRT acceleration with FP16 precision - Zero errors over 1000 inference cycles - Low latency variance (±1.1ms std dev) Technical details: - Extracts cuDNN 9.3 and TensorRT libs from JetPack for compatibility - Uses uv for fast Python package installation - CMake 3.30.5 for building extensions - 12-core parallel builds for onnxruntime compilation Files changed: - docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0 (completely rewritten) - requirements/*.txt (updated dependencies for Jetson 6.2.0) Generated with Claude Code Co-Authored-By: Claude <[email protected]>

- Set CMAKE_VERSION, TORCH_VERSION, and TORCHVISION_VERSION as build args - Use latest CMake 4.1.2 - Simplify all comments throughout Dockerfile

- Create requirements.jetson.6.2.0.txt with Jetson-specific dependencies - Keep numpy<2.0.0, torch>=2.8.0, torchvision>=0.23.0, flash-attn==2.8.2 - Don't modify shared requirements files to avoid breaking other builds - Update Dockerfile to use requirements.jetson.6.2.0.txt instead of requirements.jetson.txt

…ments - Remove requirements.transformers.txt and requirements.sam.txt from uv install - These files specify torch<2.7.0 which conflicts with Jetson's torch>=2.8.0 - Torch 2.8.0 is already installed from jetson-ai-lab.io before this step - Fixes build error: 'your requirements are unsatisfiable'

- Create requirements.transformers.jetson.txt without torch/torchvision - Create requirements.sam.jetson.txt without torch/torchvision/flash-attn - Update Dockerfile to use Jetson-specific requirements files - Prevents dependency conflicts with pre-installed Jetson PyTorch 2.8.0

- Create _requirements.jetson.txt without numpy specification - Update Dockerfile to use _requirements.jetson.txt - Prevents conflict between numpy<2.0.0 (Jetson) and numpy>=2.0.0 (main)

- Create requirements.sdk.http.jetson.txt without numpy - Update Dockerfile to use sdk.http.jetson.txt

- CMake 4.1.2 is incompatible with onnxruntime v1.20.0 dependencies - Revert to CMake 3.30.5 which is known to work

- Use latest CMake 3.x version (3.31.10) - CMake 4.x incompatible with onnxruntime v1.20.0

- Some dependency is pulling in numpy 2.x despite exclusions - Explicitly install numpy<2.0.0 after all other packages - Ensures onnxruntime compiled with numpy 1.x can run

- Install numpy>=2.0.0,<2.3.0 before PyTorch and onnxruntime build - Remove numpy<2.0.0 constraint from Jetson requirements - onnxruntime will now be compiled against numpy 2.x headers - Allows using modern numpy 2.x in production

- Jetson PyTorch 2.8.0 wheels from jetson-ai-lab.io compiled with numpy 1.x - Cannot use numpy 2.x until Jetson provides updated PyTorch wheels - Force numpy<2.0.0 after all dependencies to ensure compatibility

This prototype uses l4t-cuda:12.6.11-runtime for 31.5% size reduction while maintaining full functionality. Key features: - 2-stage build: JetPack builder + CUDA runtime - GDAL 3.11.5, onnxruntime 1.20.0 compiled from source - cuDNN, TensorRT, CUDA libs copied from JetPack - TensorRT execution providers configured for ONNX models - All inference packages built as wheels Result: 9.73 GB vs 14.2 GB (4.47 GB savings)

alexnorell · 2025-11-18T06:18:38Z

Reviewed the final image composition to identify optimization opportunities. I'm thinking this is as close as we're going to get without compiling everything from source.

Largest components (all required):

3.31 GB: Python packages (/usr/local/lib/python3.10/dist-packages) - needed for all inference models
3.01 GB: cuDNN libraries - required for PyTorch and TensorRT
1.81 GB: CUDA libraries - required for GPU acceleration
986 MB: TensorRT libraries - required for fast inference
199 MB: Runtime dependencies (apt packages) - minimal set needed

Already optimized:

Using minimal l4t-cuda:12.6.11-runtime base (not full JetPack SDK)
No development packages in runtime stage
Apt cache cleaned (rm -rf /var/lib/apt/lists/*)
uv cache cleaned (rm -rf ~/.cache/uv)
Multi-stage build (builder artifacts not copied to runtime)

- Default: 12 (for Jetson with 12 cores) - GHA/Depot: 3 (to avoid OOM on CI runners) - Allows flexible parallelism based on build environment

alexnorell · 2025-11-18T07:00:03Z

✅ Depot Build Successful

The Jetson 6.2.0 Docker image built successfully on Depot infrastructure!

Build Run: https://github.com/roboflow/inference/actions/runs/19456686627

Image Tags Produced:

roboflow/roboflow-inference-server-jetson-6.2.0:latest
roboflow/roboflow-inference-server-jetson-6.2.0:0.61.0

Size: 8.28 GB (41.7% smaller than l4t-jetpack base)

The optimized image is validated and ready for deployment on Jetson 6.2.0 devices.

- Merge requirements.jetson.6.2.0.txt into _requirements.jetson.txt - Eliminates redundant file since torch/torchvision already installed separately - Now 4 Jetson requirements files instead of 5

alexnorell · 2025-11-20T23:54:44Z

Closing in favor of #1730

alexnorell requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners November 14, 2025 14:50

alexnorell force-pushed the jetson-620-cuda-base-pr branch 2 times, most recently from 98ac317 to 1b27cbe Compare November 14, 2025 14:54

alexnorell mentioned this pull request Nov 14, 2025

[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714

Closed

alexnorell force-pushed the jetson-620-cuda-base-pr branch from 1b27cbe to 1ee9166 Compare November 14, 2025 14:55

alexnorell and others added 2 commits November 17, 2025 13:53

Remove 'prototype' references from Dockerfile

b06c55d

alexnorell force-pushed the jetson-620-cuda-base-pr branch from 3c7a245 to b06c55d Compare November 17, 2025 21:53

alexnorell added 18 commits November 17, 2025 13:56

Clean up Docker labels

713648b

Simplify Dockerfile comments and parameterize versions

e676778

- Set CMAKE_VERSION, TORCH_VERSION, and TORCHVISION_VERSION as build args - Use latest CMake 4.1.2 - Simplify all comments throughout Dockerfile

Fix numpy dependency conflict for Jetson build

b7f50a2

- Create _requirements.jetson.txt without numpy specification - Update Dockerfile to use _requirements.jetson.txt - Prevents conflict between numpy<2.0.0 (Jetson) and numpy>=2.0.0 (main)

Fix remaining numpy conflict in sdk.http requirements

c533eed

- Create requirements.sdk.http.jetson.txt without numpy - Update Dockerfile to use sdk.http.jetson.txt

Use CMake 3.30.5 for onnxruntime compatibility

9d8d062

- CMake 4.1.2 is incompatible with onnxruntime v1.20.0 dependencies - Revert to CMake 3.30.5 which is known to work

Update to latest CMake 3.31.10

2e3f5c3

- Use latest CMake 3.x version (3.31.10) - CMake 4.x incompatible with onnxruntime v1.20.0

Force numpy<2.0.0 after all dependencies installed

00279cc

- Some dependency is pulling in numpy 2.x despite exclusions - Explicitly install numpy<2.0.0 after all other packages - Ensures onnxruntime compiled with numpy 1.x can run

Compile onnxruntime with numpy 2.x support

66b3479

- Install numpy>=2.0.0,<2.3.0 before PyTorch and onnxruntime build - Remove numpy<2.0.0 constraint from Jetson requirements - onnxruntime will now be compiled against numpy 2.x headers - Allows using modern numpy 2.x in production

Update numpy constraint to allow latest 2.3.x versions

5dce8cf

Revert to numpy<2.0.0 for Jetson PyTorch compatibility

9bfa36a

- Jetson PyTorch 2.8.0 wheels from jetson-ai-lab.io compiled with numpy 1.x - Cannot use numpy 2.x until Jetson provides updated PyTorch wheels - Force numpy<2.0.0 after all dependencies to ensure compatibility

Add cache-busting comment to ensure NumPy fix is applied

754438c

Fix NumPy version conflict for Jetson 6.2.0 builds

d5dd0c2

Remove prototype Dockerfile - using only the optimized main Dockerfile

f5d0d18

Add EXPOSE 9001 for container port documentation

458a7e4

alexnorell added 2 commits November 17, 2025 22:22

Add ONNXRUNTIME_BUILD_PARALLEL build arg for memory-constrained builds

e4af558

- Default: 12 (for Jetson with 12 cores) - GHA/Depot: 3 (to avoid OOM on CI runners) - Allows flexible parallelism based on build environment

Remove ONNXRUNTIME_BUILD_PARALLEL override - using larger Depot builder

50cc24c

Consolidate Jetson requirements into _requirements.jetson.txt

12dbdaf

- Merge requirements.jetson.6.2.0.txt into _requirements.jetson.txt - Eliminates redundant file since torch/torchvision already installed separately - Now 4 Jetson requirements files instead of 5

alexnorell mentioned this pull request Nov 18, 2025

Jetpack 6.2 Support #1730

Merged

3 tasks

PawelPeczek-Roboflow previously approved these changes Nov 18, 2025

View reviewed changes

Merge branch 'main' into jetson-620-cuda-base-pr

c38f9c6

alexnorell dismissed PawelPeczek-Roboflow’s stale review via 962ebae November 18, 2025 22:41

alexnorell force-pushed the jetson-620-cuda-base-pr branch from 83844bd to c38f9c6 Compare November 18, 2025 22:52

alexnorell requested a review from PawelPeczek-Roboflow November 18, 2025 22:56

alexnorell added a commit that referenced this pull request Nov 19, 2025

Use CMake 3.31.10 with nsync patch (matching working PR #1718)

02ddb64

alexnorell closed this Nov 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Jetson 6.2.0 Docker image with l4t-cuda base (41.7% size reduction) #1718

Optimize Jetson 6.2.0 Docker image with l4t-cuda base (41.7% size reduction) #1718

alexnorell commented Nov 14, 2025 •

edited

Loading

Uh oh!

alexnorell commented Nov 18, 2025 •

edited

Loading

Uh oh!

alexnorell commented Nov 18, 2025

Uh oh!

alexnorell commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize Jetson 6.2.0 Docker image with l4t-cuda base (41.7% size reduction) #1718

Optimize Jetson 6.2.0 Docker image with l4t-cuda base (41.7% size reduction) #1718

Conversation

alexnorell commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Improvements

Benchmark Results

Technical Details

Type of change

How has this change been tested?

Deployment considerations

Docs

Uh oh!

alexnorell commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexnorell commented Nov 18, 2025

✅ Depot Build Successful

Uh oh!

alexnorell commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexnorell commented Nov 14, 2025 •

edited

Loading

alexnorell commented Nov 18, 2025 •

edited

Loading