Skip to content

PINTO0309/YOLO

Β 
Β 

Repository files navigation

This fork is an MIT version of YOLO, with some bug fixes and the addition of the V9-N (nano) and V9-E (extended) variants to the original. This repository is already capable of achieving convergence speed and accuracy comparable to stable GPLv3. https://github.com/MultimediaTechLab/YOLO.


YOLO: Official Implementation of YOLOv9, YOLOv7, YOLO-RD

Documentation Status GitHub License WIP

Developer Mode Build & Test Deploy Mode Validation & Inference

PWC

Open In Colab Hugging Face Spaces

Welcome to the official implementation of YOLOv71 and YOLOv92, YOLO-RD3. This repository will contains the complete codebase, pre-trained models, and detailed instructions for training and deploying YOLOv9.

TL;DR

  • This is the official YOLO model implementation with an MIT License.

Introduction

Installation

To get started using YOLOv9's developer mode, we recommand you clone this repository and install the required dependencies:

git clone https://github.com/PINTO0309/YOLO.git
cd YOLO

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate
export PYTHONWARNINGS="ignore"

Features

Task

For more customization details, please refer to HOWTO.

YOLO format dataset structure

data
└── wholebody34
    β”œβ”€β”€ train.pache # Cache file automatically generated when training starts
    β”œβ”€β”€ val.pache # Cache file automatically generated when training starts
    β”œβ”€β”€ images
    β”‚   β”œβ”€β”€ train
    β”‚   β”‚   β”œβ”€β”€ 000000000036.jpg
    β”‚   β”‚   β”œβ”€β”€ 000000000077.jpg
    β”‚   β”‚   β”œβ”€β”€ 000000000110.jpg
    β”‚   β”‚   β”œβ”€β”€ 000000000113.jpg
    β”‚   β”‚   └── 000000000165.jpg
    β”‚   └── val
    β”‚       β”œβ”€β”€ 000000000241.jpg
    β”‚       β”œβ”€β”€ 000000000294.jpg
    β”‚       β”œβ”€β”€ 000000000308.jpg
    β”‚       β”œβ”€β”€ 000000000322.jpg
    β”‚       └── 000000000328.jpg
    └── labels
        β”œβ”€β”€ train
        β”‚   β”œβ”€β”€ 000000000036.txt
        β”‚   β”œβ”€β”€ 000000000077.txt
        β”‚   β”œβ”€β”€ 000000000110.txt
        β”‚   β”œβ”€β”€ 000000000113.txt
        β”‚   └── 000000000165.txt
        └── val
            β”œβ”€β”€ 000000000241.txt
            β”œβ”€β”€ 000000000294.txt
            β”œβ”€β”€ 000000000308.txt
            β”œβ”€β”€ 000000000322.txt
            └── 000000000328.txt
  • 000000000036.txt

    Item Note
    classId classId
    cx, cy 0.0-1.0 normalized center coordinates
    w, h 0.0-1.0 normalized width and height

    classId cx cy w h

    30 0.729688 0.959667 0.141042 0.080667
    25 0.919385 0.974417 0.052521 0.051167
    25 0.525000 0.680847 0.049167 0.071806
    23 0.663813 0.657361 0.100125 0.105889
    21 0.612667 0.519583 0.068542 0.068056
    29 0.628292 0.896000 0.292500 0.082889
    30 0.546063 0.957611 0.210792 0.084778
    19 0.547917 0.417986 0.073125 0.037361
    26 0.488281 0.653583 0.123104 0.151444
    24 0.840208 0.778889 0.080417 0.092222
    24 0.435312 0.790972 0.074375 0.089167
    22 0.411469 0.557500 0.103313 0.112222
    22 0.773646 0.546944 0.087708 0.110556
    9 0.560417 0.366667 0.233333 0.266667
    7 0.560417 0.366667 0.233333 0.266667
    27 0.956385 0.970417 0.087229 0.055833
    16 0.541667 0.370833 0.154167 0.197222
    26 0.956385 0.970417 0.087229 0.055833
    4 0.681458 0.621667 0.637083 0.756667
    0 0.681458 0.621667 0.637083 0.756667
    18 0.527188 0.373333 0.042917 0.047500
    20 0.644792 0.370028 0.023125 0.036667
    1 0.681458 0.621667 0.637083 0.756667
    28 0.488281 0.653583 0.123104 0.151444
    17 0.489687 0.370972 0.032917 0.020556
    17 0.561875 0.350694 0.044583 0.019722
    

Dataset config

yolo/config/dataset/wholebody34.yaml

path: data/wholebody34
train: train
validation: val

class_num: 34
class_list: ['body', 'adult', 'child', 'male', 'female', 'body_with_wheelchair', 'body_with_crutches', 'head', 'front', 'right-front', 'right-side', 'right-back', 'back', 'left-back', 'left-side', 'left-front', 'face', 'eye', 'nose', 'mouth', 'ear', 'collarbone', 'shoulder', 'solar_plexus', 'elbow', 'wrist', 'hand', 'hand_left', 'hand_right', 'abdomen', 'hip_joint', 'knee', 'ankle', 'foot']

auto_download:

Training

To train YOLO on your machine/dataset:

  1. Modify the configuration file yolo/config/dataset/**.yaml to point to your dataset.
  2. Run the training script:
uv run python yolo/lazy.py task=train dataset=** use_wandb=True
uv run python yolo/lazy.py task=train task.data.batch_size=8 model=v9-c weight=False # or more args

Transfer Learning

To perform transfer learning with YOLOv9:

# n, t, s, c, e
VARIANT=n
EPOCH=100
BATCHSIZE=8

uv run python yolo/lazy.py \
task=train \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True

# When specifying trained weights as initial weights
uv run python yolo/lazy.py \
task=train \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
weight="runs/train/v9-n/lightning_logs/version_1/checkpoints/best_n_0002_0.0065.pt" \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True

# Automatically downloading the initial weights published by the official repository
# Default: weight=True
# Weight download path: weights/*.pt
uv run python yolo/lazy.py \
task=train \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
weight=True \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True

# When starting training without initial weights
# Default: weight=True
uv run python yolo/lazy.py \
task=train \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
weight=False \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True

# Resume learning from where you left off
# Please note that you must specify the Lightning checkpoint file (.ckpt)
# and not the .pt file that contains only the EMA weights.
# Unlike the official implementation, all parameters are restored from the .ckpt file,
# so training resumes exactly where it left off.
uv run python yolo/lazy.py \
task=train \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
task.resume_ckpt="runs/train/v9-n/lightning_logs/version_3/checkpoints/epoch_5_step_3660.ckpt" \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True

# To run a shorter fine-tuning schedule, use the dedicated configuration
# at `yolo/config/task/trainft.yaml`
# All CLI overrides available for `task=train` (e.g., `task.data.batch_size`,
# `task.resume_ckpt`) also apply to `task=trainft`.
VARIANT=n
EPOCH=60
BATCHSIZE=8
uv run python yolo/lazy.py \
task=trainft \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
weight="runs/train/v9-n/lightning_logs/version_1/checkpoints/best_n_0002_0.0065.pt" \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True

# # DDP (Distributed data parallel training), Multi-GPU training
# # Below is a sample for 8 GPUs
# # n, t, s, c, e
# VARIANT=n
# EPOCH=100
# # Number of GPUs running on one node
# NPROC=8
# # When NPROC=8, the string [0,1,2,3,4,5,6,7] is set to DEVICES.
# DEVICES="[$(seq -s, 0 $((NPROC-1)))]"
# # When there are 8 GPUs and 8 batches are assigned to each GPU
# # {Batch size per GPU} x {Number of GPUs} = {Total batch size}
# # 8 x 8 = 64
# BATCHSIZE=8
# TOTALBATCHSIZE=$((BATCHSIZE * NPROC))

# uv run torchrun \
# --nproc_per_node=${NPROC} \
# yolo/lazy.py \
# task=train \
# device=${DEVICES} \
# name=v9-${VARIANT} \
# task.epoch=${EPOCH} \
# task.data.batch_size=${TOTALBATCHSIZE} \
# task.data.cpu_num=$((TOTALBATCHSIZE / NPROC)) \
# model=v9-${VARIANT} \
# weight=False \
# dataset=wholebody34 \
# use_wandb=False \
# use_tensorboard=True

↓↓↓ Experimental implementation. Not recommended as accuracy is significantly reduced. ↓↓↓
# Online Knowledge Distillation (Teacher E β†’ Student {C,S,T,N})
# Default: task.kd.enable=False
# ./ARCHITECTURE_ENHANCED_YOLOv9.md#8-online-knowledge-distillation-teacher-e--student-cstn
# ./yolo/config/task/train.yaml
uv run python yolo/lazy.py \
task=train \
name=v9-${VARIANT} \
task.epoch=${EPOCH} \
task.data.batch_size=${BATCHSIZE} \
model=v9-${VARIANT} \
weight=False \
task.kd.enable=True \
task.kd.teacher_model=v9-e \
task.kd.teacher_weight=weights/v9-e.pt \
task.kd.apply_to=both \
dataset=wholebody34 \
device=cuda \
use_wandb=False \
use_tensorboard=True
↑↑↑ Experimental implementation. Not recommended as accuracy is significantly reduced. ↑↑↑

⚠️ important points ⚠️

1. About batch size

Pay particular attention to the maximum number of CPU threads and the amount of RAM on the machine you are trying to train on. I'm talking RAM, not VRAM. The number of worker processes specified during training is batch_size + 1, but you must adjust batch_size so that it is less than the maximum number of CPU threads - 1. Also, the amount of RAM consumed increases in proportion to the number of enabled augmentations, so you need to pay attention to the amount of RAM installed on your PC. Checking only the amount of VRAM is not enough. If you need to run heavy augmentation that exceeds the RAM capacity, we recommend setting batch_size to a relatively small value.

The figure below shows the CPU and RAM status of my work PC. When I run 16 batches with the maximum number of augmentations enabled, 17 threads are started, which not only consumes a lot of RAM, causing the learning process to silently abort after a few epochs without outputting any errors.

image

2. If training does not start normally (silently aborts)

Countermeasure for situations where resume is unstable and CUDA initialization error occurs https://discuss.pytorch.org/t/dataloader-num-workers-1-cuda-initialization-error-3/159989

If the following mp.set_start_method is specified, there are some environments where the process will silently terminate before learning begins. Therefore, if you are in an environment where learning does not start normally, it may be a good idea to comment out the following line: mp.set_start_method.

  • yolo/lazy.py
    if __name__ == "__main__":
        # Countermeasure for situations where resume is unstable and CUDA initialization error occurs
        # https://discuss.pytorch.org/t/dataloader-num-workers-1-cuda-initialization-error-3/159989
        # If the following `mp.set_start_method` is specified, there are some environments where
        # the process will silently terminate before learning begins.
        # Therefore, if you are in an environment where learning does not start normally,
        # it may be a good idea to comment out the following line: `mp.set_start_method`.
        # mp.set_start_method("spawn", force=True) <--- Here
        main()

Validation graph during training

To speed up training and significantly reduce VRAM consumption during training, validation is limited to a simple, minimal evaluation per epoch. Therefore, validation results other than the final epoch do not properly evaluate the model's true performance, but they do confirm that training is progressing normally, that accuracy is not deteriorating significantly, and that overfitting is not occurring. The true performance of the model can only be confirmed by the evaluation results of rigorous validation performed at the final epoch. This means that the spot validation results do not perfectly match the true weight improvement as the learning progresses. It would be foolish to perform early stopping based solely on the validation status of each epoch. First of all, you should not use an insufficient dataset that results in overfitting.

The final epoch performs fairly accurate validation, so it may take several minutes or more depending on the volume of your dataset.

  • NMS settings for validation at each learning progress

    Intermediate Epoch γ€€γ€€γ€€ Final Epoch
    pre_topk 300 20,000
    max_bbox 300 20,000
    multi_label False True
    class_agnostic False False
    20250925085055

print_map_per_class

If you want to display the AP for each class for all epochs, change yolo/config/task/validation.yaml's print_map_per_class: True and start training. If print_map_per_class: False is set, AP per class will be calculated and output only once at the end of the final epoch. Since print_map_per_class takes a very long time to process, we recommend setting it to False and automatically calculating map_per_class only in the final epoch.

┏━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━┓
┃Epoch┃Avg. Precision  ┃     %┃Avg. Recall     ┃     %┃
┑━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━┩
β”‚    2β”‚AP @ .5:.95     β”‚000.77β•ŽAR maxDets   1  β”‚003.08β”‚
β”‚    2β”‚AP @     .5     β”‚002.02β•ŽAR maxDets  10  β”‚006.91β”‚
β”‚    2β”‚AP @    .75     β”‚000.45β•ŽAR maxDets 100  β”‚008.74β”‚
β”‚    2β”‚AP  (small)     β”‚000.33β•ŽAR     (small)  β”‚001.93β”‚
β”‚    2β”‚AP (medium)     β”‚000.69β•ŽAR    (medium)  β”‚007.74β”‚
β”‚    2β”‚AP  (large)     β”‚001.34β•ŽAR     (large)  β”‚008.55β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜
┏━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ ID┃Name                     ┃     AP┃ ID┃Name                     ┃     AP┃
┑━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
β”‚  0β”‚body                     β”‚ 0.0343β”‚ 20β”‚ear                      β”‚ 0.0023β”‚
β”‚  1β”‚adult                    β”‚ 0.0320β”‚ 21β”‚collarbone               β”‚ 0.0003β”‚
β”‚  2β”‚child                    β”‚ 0.0000β”‚ 22β”‚shoulder                 β”‚ 0.0033β”‚
β”‚  3β”‚male                     β”‚ 0.0268β”‚ 23β”‚solar_plexus             β”‚ 0.0003β”‚
β”‚  4β”‚female                   β”‚ 0.0103β”‚ 24β”‚elbow                    β”‚ 0.0001β”‚
β”‚  5β”‚body_with_wheelchair     β”‚ 0.0029β”‚ 25β”‚wrist                    β”‚ 0.0001β”‚
β”‚  6β”‚body_with_crutches       β”‚ 0.0455β”‚ 26β”‚hand                     β”‚ 0.0029β”‚
β”‚  7β”‚head                     β”‚ 0.0340β”‚ 27β”‚hand_left                β”‚ 0.0022β”‚
β”‚  8β”‚front                    β”‚ 0.0102β”‚ 28β”‚hand_right               β”‚ 0.0027β”‚
β”‚  9β”‚right-front              β”‚ 0.0155β”‚ 29β”‚abdomen                  β”‚ 0.0005β”‚
β”‚ 10β”‚right-side               β”‚ 0.0059β”‚ 30β”‚hip_joint                β”‚ 0.0006β”‚
β”‚ 11β”‚right-back               β”‚ 0.0023β”‚ 31β”‚knee                     β”‚ 0.0010β”‚
β”‚ 12β”‚back                     β”‚ 0.0001β”‚ 32β”‚ankle                    β”‚ 0.0012β”‚
β”‚ 13β”‚left-back                β”‚ 0.0015β”‚ 33β”‚foot                     β”‚ 0.0063β”‚
β”‚ 14β”‚left-side                β”‚ 0.0025β”‚   β”‚                         β”‚       β”‚
β”‚ 15β”‚left-front               β”‚ 0.0105β”‚   β”‚                         β”‚       β”‚
β”‚ 16β”‚face                     β”‚ 0.0047β”‚   β”‚                         β”‚       β”‚
β”‚ 17β”‚eye                      β”‚ 0.0000β”‚   β”‚                         β”‚       β”‚
β”‚ 18β”‚nose                     β”‚ 0.0000β”‚   β”‚                         β”‚       β”‚
β”‚ 19β”‚mouth                    β”‚ 0.0000β”‚   β”‚                         β”‚       β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Weights after training

The weights after training are output to the following path.

File Note
best_{variant}_{epoch:04}_{map:.4f}.pt Optimized weight file containing only EMA weights. The weights with the highest mAP are automatically saved.
epoch_{epoch}_step_{step}.ckpt A checkpoint file containing all learning logs automatically saved by Lightning.
last.pt Optimized weight file containing only EMA weights. The weights of the last epoch are automatically saved.

e.g.

runs/train/v9-n/lightning_logs/version_0/checkpoints
β”œβ”€β”€ best_n_0002_0.0065.pt
β”œβ”€β”€ epoch_2_step_3462.ckpt
└── last.pt

Inference

To use a model for object detection, use:

# n, t, s, c, e
VARIANT=n
RENDER_LABELS=False

# If you do not specify `dataset={dataset_name}` correctly,
# the classification head weights will not be loaded properly
# and you will not see any inference results.
# The number of classes in the head part of the weights used for inference
# must match `class_num`.
# https://github.com/PINTO0309/YOLO/blob/wholebody/yolo/config/dataset/wholebody34.yaml
---
path: data/wholebody34
train: train
validation: val

class_num: 34 # <--- Here
class_list: ['body', ..., 'foot']
---

uv run python yolo/lazy.py \
task=inference \
name=v9-${VARIANT} \
model=v9-${VARIANT} \
weight="runs/train/v9-n/lightning_logs/version_1/checkpoints/best_n_0002_0.0065.pt" \
dataset=wholebody34 \
task.nms.min_confidence=0.1 \
task.fast_inference=onnx \
task.data.source=data/wholebody34/images/val \
task.data.max_samples=100 \
task.render_labels=${RENDER_LABELS} \
+quite=True
frame074

Validation

To validate model performance, or generate a json file in COCO format:

# n, t, s, c, e
VARIANT=n
# Specify the same `batch_size` as the validation batch size used during training.
# Otherwise, the mAP value after validation will be significantly degraded.
# data:
#  batch_size: 32
# https://github.com/PINTO0309/YOLO/blob/wholebody/yolo/config/task/validation.yaml
BATCHSIZE=32
# The higher the model's performance, the more accurate the evaluation will be
# if the MAXDET value (the upper limit of the number of detections) is set to
# a larger value. The default value is 1,000. yolo/config/task/validation.yaml
# However, setting a value that exceeds the maximum number of labels contained
# in one image will have no effect. For example, in my dataset, an image contains
# a maximum of 3,875 labels, so setting it to 4,000 is appropriate.
MAXDET=20000

uv run python yolo/lazy.py \
task=validation \
name=v9-${VARIANT} \
task.data.batch_size=${BATCHSIZE} \
task.nms.pre_topk=${MAXDET} \
task.nms.max_bbox=${MAXDET} \
task.nms.multi_label=True \
task.nms.class_agnostic=False \
model=v9-${VARIANT} \
weight="runs/train/v9-n/lightning_logs/version_1/checkpoints/best_n_0002_0.0065.pt" \
dataset=wholebody34 \
device=cuda \
use_wandb=False

Export

Use the Hydra-driven CLI to run the export task and produce a compact ONNX graph. The exporter emits a single [batches, 4 + num_classes, boxes] tensor, keeps detection heads minimal, and derives an informative filename (e.g. best_e_0060_0.6585_1x3x480x640.onnx). Example:

uv run python yolo/lazy.py \
task=export \
name=v9-demo \
model=v9-e \
dataset=wholebody34 \
weight="runs/trainft/v9-e/lightning_logs/version_ft0/checkpoints/best_e_0060_0.6585.pt" \
task.dynamic_batch=False \
task.dynamic_size=False \
task.image_size=480x640 \
task.batch_size=1 \
task.opset=13 \
task.half=false \
task.apply_sigmoid=True \
task.include_metadata=True
  • output: [batches, [cx,cy,w,h,class_scores], boxes] image

Key overrides (all optional):

  • task.batch_size: dummy input batch size (default 1).
  • task.dynamic_batch: true marks batch as symbolic N and names the file accordingly.
  • task.dynamic_size: true marks Height and Width as symbolic H, W and names the file accordingly.
  • task.image_size: input resolution. Accepts 'HxW'.
  • task.batch_size: input batch size.
  • task.opset: ONNX opset version (default 13).
  • task.simplify: run onnxsim for graph simplification.
  • task.half: export weights/activations in FP16.
  • task.apply_sigmoid: emit post-sigmoid class probabilities instead of raw logits.
  • task.include_metadata: embed class names in ONNX metadata.
  • task.output_path: explicit destination; omit to auto-name beside the weight file.
  • task.name: experiment/run folder label (standard Hydra behaviour).

Generate and merge post-processing with NMS

tools/post_process_gen_tools

image

Convert ONNX with NMS to LiteRT/TensorFlow.js

If you want to use webgpu, you can use ONNX without NMS or TensorFlow.js models without NMS. If you don't want to go through ONNX, you can output the LiteRT model directly from PyTorch using ai_edge_torch.

  • ONNX to TF/LiteRT
    # Transformation with `Grouped Convolution` disabled
    uv run onnx2tf -i yolov9_n_wholebody25_post_0100_1x3x480x640.onnx -dgc
  • TF to TFJS
    uv run tensorflowjs_converter \
    --input_format tf_saved_model \
    --output_format tfjs_graph_model \
    saved_model \
    tfjs_model

Simple performance benchmark using ONNX/TensorRT

# Install CUDA==12.9
#  https://developer.nvidia.com/cuda-toolkit-archive
# Install TensorRT==10.13.3.9-1+cuda12.9
#  https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html
uv run sit4onnx -if best_e_0205_0.4140_1x3x640x640.onnx -oep cpu

INFO: file: best_e_0205_0.4140_1x3x640x640.onnx
INFO: providers: ['CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time:  3673.502206802368 ms
INFO: avg elapsed time per pred:  367.3502206802368 ms
INFO: output_name.1: output shape: [1, 38, 8400] dtype: float32
uv run sit4onnx -if best_e_0205_0.4140_1x3x640x640.onnx -oep cuda

INFO: file: best_e_0205_0.4140_1x3x640x640.onnx
INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time:  350.10218620300293 ms
INFO: avg elapsed time per pred:  35.01021862030029 ms
INFO: output_name.1: output shape: [1, 38, 8400] dtype: float32
# It will take a while to generate the TensorrtExecutionProvider_TRTKernel_*.engine cache.
uv run sit4onnx -if best_e_0205_0.4140_1x3x640x640.onnx -oep tensorrt

INFO: file: best_e_0205_0.4140_1x3x640x640.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: images shape: [1, 3, 640, 640] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time:  104.28452491760254 ms
INFO: avg elapsed time per pred:  10.428452491760254 ms
INFO: output_name.1: output shape: [1, 38, 8400] dtype: float32
# With NMS + TensorRT
# For models with dynamic tensors as input, specify the size of the tensor
# to be tested using the --fixed_shapes / -fs option.
uv run sit4onnx -if yolov9_n_wholebody25_post_0100_1x3xHxW.onnx -oep tensorrt -fs 1 3 480 640

INFO: file: yolov9_n_wholebody25_post_0100_1x3x480x640.onnx
INFO: providers: ['TensorrtExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: input_bgr shape: [1, 3, 480, 640] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time:  20.3857421875 ms
INFO: avg elapsed time per pred:  2.03857421875 ms
INFO: output_name.1: batchno_classid_score_x1y1x2y2 shape: [0, 7] dtype: float32

Contributing

Contributions to the YOLO project are welcome! See CONTRIBUTING for guidelines on how to contribute.

Star History

Star History Chart

Citations

@inproceedings{wang2022yolov7,
      title={{YOLOv7}: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors},
      author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
      year={2023},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},

}
@inproceedings{wang2024yolov9,
      title={{YOLOv9}: Learning What You Want to Learn Using Programmable Gradient Information},
      author={Wang, Chien-Yao and Yeh, I-Hau and Liao, Hong-Yuan Mark},
      year={2024},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
}
@inproceedings{tsui2024yolord,
      author={Tsui, Hao-Tang and Wang, Chien-Yao and Liao, Hong-Yuan Mark},
      title={{YOLO-RD}: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary},
      booktitle={Proceedings of the International Conference on Learning Representations (ICLR)},
      year={2025},
}

Footnotes

  1. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors ↩

  2. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information ↩

  3. YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary ↩

About

An MIT License of YOLOv9, YOLOv7, YOLO-RD

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.1%
  • Shell 3.5%
  • Dockerfile 0.4%