TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Daniel Rossi, Guido Borghi, Roberto Vezzani

University of Modena and Reggio Emilia, Italy

ArXiv · CVF

Table of Contents 🔑

Introduction
Installation
Usage
Inference Scripts
- Single Image Inference
- Batch Image Inference
Inference on Edge Devices
Additional Information
Citation
License

Introduction 🎙

TakuNet is a convolutional architecture designed to be extremely efficient when deployed on embedded systems. Extensive experiments on AIDER and AIDERV2 demostrate that TakuNet is able to achieve near-state-of-the-art accuracy while being extremely efficient in terms of number of parameters, memory footprint and FLOPs among the competitors.

A deeper inspection on models performance on a few embedded devices such as Raspberry Pis and NVIDIA Jetson Orin Nano show that TakuNet can achieve a very large speedup in terms of Frame per Second against competitors, mostly on recent embedded architectures. Since TakuNet is trained with float-16 resolution, its optimization through TensorRT on NVIDIA hardware accelerator does not approximate the model weights.

Installation ⌨️

TakuNet code exploits docker container to simplify code distribution and execution on different devices and hardware architectures. If you have already installed docker on your machine, you can skip the docker setup step.

This work was developed on a Ubuntu-24.04.1-LTS-based system with NVIDIA Drivers 560.35.03, equipped with an Intel i5 8600K, 16GB DDR4 2666MHz, NVIDIA RTX 3090 24GB. Training were performed on a different machine, composed of an Intel i7 12700F and NVIDIA RTX 4070ti Super. On the other hand, experiments on Raspberry Pi(s) were conducted through the docker container running on Raspbian Bookworm, while NVIDIA Jetpack 6.1 was installed on the Jetson Orin Nano Devkit device.

Docker setup 🚢

Install docker on a Linux based machine possibly

wget https://get.docker.com/ -O get_docker.sh
chmod +x get_docker.sh
bash get_docker.sh

Once docker has been installed, install nvidia-docker for GPU support

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

update and install the nvidia container tool

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

configure nvidia container toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Add required permissions to your user in order to perform actions with docker on containers
```
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
```

Repository setup 📂

Clone the repository

git clone https://github.com/DanielRossi1/TakuNet.git

Build the docker container
```
cd TakuNet/docker
./build.sh
```

Once the container has finished to build, run it. the run script supports directory mount through arguments. Directories are mounted in /home/user/...

# Just run the container
./run

# run the container and mount a directory (e.g. the one which contains the dataset). Here you will find AIDER in /home/user/AIDER
./run -d /home/your-username/path-to-data/AIDER

Required Python Packages 📦

For standalone installations without Docker, ensure you have the following Python packages installed:

pip install torch torchvision opencv-python matplotlib numpy pillow

Usage 🧰

The execution interface is really simple, it consists of a bash script which launches the main.py script, automatically loading the arguments and configurations specified in TakuNet's configuration file: configs/TakuNet.yml.

cd src
./launch.sh TakuNet

Configuration file parameters ⚙️:

Base Settings

num_epochs (int): Total number of training epochs
batch_size (int): Batch size used for training
seed (int): Random seed set for training and testing
experiment_name (str): Name of the folder that will be created for the training, or sourced for testing. It will be created in src/runs/. Multiple runs over the same experiment name will overwrite logs.
main_runs_folder (path): Here you specify the train and test output path
pin_memory (bool): Torchvision dataloader pin memory
mode (Train/Test/Export): You can choose to train, test, or export the model in ONNX format

Logging

tensorboard (bool): Whether to use TensorBoard for logging
wandb (bool): Whether to use Weights and Biases (wandb) for logging
gradcam (bool): enable GradCam for gradient flow inspection

Dataset and Data loading

num_workers (int): Number of threads used by the dataloader
persistent_workers (bool): torchvision dataloader persistent workers
dataset (AIDER/AIDERV2): Specifies the dataset, and in particular the dataloader, to be used
data_path (path): The path where the actual dataset is stored on your docker container (e.g. /home/user/Data/AIDER)
num_classes (int): Number of output classes of the model. AIDER has 5 classes while AIDERV2 has 4 classes of different images.
img_height (int): Images are resized by default, this sets the height.
img_width (int): Images are resized by default, this sets the width.
augment (bool): Enables or disables data augmentation
k_fold (int): Number of folds for k-fold cross-validation. Works only on AIDER since AIDERV2 has its own stand-alone validation set.
split (proportional/exact): Defines how to split the AIDER dataset. proportional follows the same proportions used in the EmergencyNet paper, while exact creates a test set of equal size to the one used in the latter.
no_validation (bool): If set to false, does not create a validation set for AIDER

Pytorch Lightning Precision

lightning_precision (16-mixed/32-true): 16-bit floating point mixed precision or 32-bit floating point precision

Model settings

network (str): 'TakuNet' is the only available model
input_channels (int): Number of channels of the input images, default is 3 for RGB
dense (bool): Enables or disables dense connections in TakuNet
ckpts_path (str): Path of the checkpoints to be used in inference (filename included)

Optimization parameters

optimizer (str): Which optimizer to be used in training (available: adam, adamw, sgd, rmsprop)
scheduler (str): Learning rate schedulers used in training (available: cosine, cyclic, step, lambda). These are set in src/networks/LightningNet.py
scheduler_per_epoch (bool): Update the learning rate at the end of each epoch
learning_rate (float): Initial learning rate
learning_rate_decay (float): Decay used by schedulers
learning_rate_decay_steps (float): Decay steps used by schedulers
min_learning_rate (float): Minimum learning rate value
warmup_epochs (int): Learning rate warmup epochs
warmup_steps (int): Learning rate warmup steps
weight_decay (float): Weight decay used by the optimizer
weight_decay_end (float): Uses the same scheduler as learning rate, thus this set the min value
update_freq (int): Update frequency for training steps
label_smoothing (float [0,1]): Sets label smoothing for cross-entropy loss
model_ema (bool): Whether to use model exponential moving average
alpha (float): Alpha value for RMSprop
momentum (float): Momentum value for RMSprop and SGD
class_weights (list of float): Class weights to be used in cross-entropy loss

Export

onnx_opset_version (int): set onnx opset version for exported model

Inference Scripts 🔮

TakuNet provides two different inference scripts for evaluating the model on test images:

Single Image Inference

The singleimage_inference.py script allows you to process individual images and visualize the results.

Usage:

python src/singleimage_inference.py --image path/to/image.jpg --checkpoint src/ckpts/TakuNet_AIDERV2.ckpt --visualize

Arguments:

--image (required): Path to the input image
--checkpoint (required): Path to the model checkpoint
--width (optional, default=224): Image width for processing
--height (optional, default=224): Image height for processing
--visualize (optional flag): Display visualization of results

Example with included checkpoints:

# Process a single image from the Test directory using the AIDERV2 checkpoint
python src/singleimage_inference.py --image src/Test/Flood/image_265.png --checkpoint src/ckpts/TakuNet_AIDERV2.ckpt --visualize

# Process a single image using the AIDER checkpoint
python src/singleimage_inference.py --image src/Test/Fire/image_123.png --checkpoint src/ckpts/TakuNet_AIDER.ckpt --visualize

The script will output the predicted disaster class and confidence scores for each category (Earthquake, Fire, Flood, Normal).

Batch Image Inference

The batch_image_inference.py script processes multiple images from test directories, creates visualizations, and compiles them into a video.

Usage:

python src/batch_image_inference.py --checkpoint src/ckpts/TakuNet_AIDERV2.ckpt --test_dir src/Test --output_dir predictions_results

Arguments:

--test_dir (optional, default='src/Test'): Directory containing test images organized in class folders
--checkpoint (required): Path to the model checkpoint file
--output_dir (optional, default='predictions_results'): Directory to save results
--num_images (optional, default=20): Number of random images to select
--width (optional, default=224): Image width for processing
--height (optional, default=224): Image height for processing

Example with included checkpoints:

# Process 20 random images using the AIDERV2 checkpoint
python src/batch_image_inference.py --checkpoint src/ckpts/TakuNet_AIDERV2.ckpt --test_dir src/Test --output_dir predictions_results

# Process 50 random images using the AIDER checkpoint
python src/batch_image_inference.py --checkpoint src/ckpts/TakuNet_AIDER.ckpt --test_dir src/Test --output_dir predictions_results --num_images 50

Test Data Structure: The repository includes a test dataset organized in the src/Test directory with the following structure:

src/Test/
  ├── Earthquake/
  │   └── [earthquake images]
  ├── Fire/
  │   └── [fire disaster images]
  ├── Flood/
  │   └── [flood disaster images]
  └── Normal/
      └── [non-disaster images]

What the script does:

Randomly selects images from the test directory across all disaster class folders
Processes each image through the TakuNet model
Creates visualizations showing:
- The original image with true and predicted labels
- A bar chart of confidence scores for each class
Saves individual visualizations to the output directory
Compiles all visualizations into an MP4 video for easy viewing
Calculates and displays accuracy statistics

Understanding the Results: After running the batch inference script, you'll find the following in your output directory (default: predictions_results):

Individual Result Images: Files named result_XXX.png contain:
- Left side: Original image with true class and predicted class labels
- Right side: Bar chart showing confidence scores for each class
Video Summary: results_video.mp4 compiles all visualization images into a single video file for easy viewing and sharing
Console Output: Displays a summary of the processing including:
- Number of images found and processed
- Accuracy metrics (percentage of correctly classified images)

Available Checkpoints: The repository includes two pre-trained model checkpoints:

src/ckpts/TakuNet_AIDER.ckpt: Model trained on the AIDER dataset
src/ckpts/TakuNet_AIDERV2.ckpt: Model trained on the AIDERV2 dataset (recommended)

Inference on Edge Devices 🔋

Embedded device inference scripts are located in the embedded folder, and require a proper configuration for each specific target device. The main configuration file is located in embedded/configs/TakuNet.yml.

cd src
python3 embedded/main.py --cfg-path embdedded/configs/TakuNet.yml

Inference configuration parameters ⚙️

The inference script has to adapt based on the target execution device. Thus you need to properly set a few parameters before launching the main script.

onnx_model_path: where the exported onnx file is located
engine_model_path: where to store the TensorRT engine
use_tensorrt: wether to enable TensorRT, to be used only on Jetson devices (set to false on Raspberry Pi)
fp16_mode: true if your ONNX model is half-precision, else false if it has been exported with float-32 precision
dataset_size: test are conducted on randomly generated images (Torchvision FakeData) since we only want to measure inference speed. We set a number of images equal to 2600 and we drop the first 100 to compensate for the warm-up time
img_size: specifies the shape of the image (AIDER and AIDERV2 have different image shape)
num_classes: must be the same number of classes used during model training
batch_size: size of the batch of images to be processed in parallel (default 1)
old_jetpack: This option allows the model to be optimized using TensorRT on older Jetson devices. Since this process is not straightforward, it may still encounter issues or errors. However, we encourage you to try it and report any issues you encounter.

Additional Information 🔍

TensorRT Export

To properly export a model exploiting TensorRT optimization, you need to set use_tensorrt: true in embedded/config/TakuNet.yml. The optimization should take place on the hardware device and requires onnx checkpoints to be already exported.

You may face some issues when trying to compress the model through TensorRT on older Jetson devices such as NVIDIA Jetson Nano (Maxwell) or NVIDIA Jetson TX1. In such cases, we suggest to lower the ONNX opset version, and set old_jetpack: true during inference.

Performance on Edge Devices

Embedded devices require a stable input voltage to operate effectively. Improper use of power supplies, including unsuitable cables, may result in degraded and unstable performance. In some cases, such misuse could potentially cause permanent damage to the devices.

To maximize the performance of embedded devices, it is recommended to stop any application or service that may interfere with their operation. These can introduce unnecessary overhead or cause resource contention, potentially impacting the efficiency and responsiveness of the devices.

For optimal performance with TakuNet, we recommend performing a fresh OS installation. Furthermore, active termal cooling should be installed (if not already present) to avoid thermal throttling.

Troubleshooting Common Issues

Missing Python Libraries: If you encounter errors about missing libraries, install them using pip:
```
pip install torch torchvision opencv-python matplotlib numpy pillow
```
Path Resolution Problems: When running the batch inference script, make sure your directory paths are correct. If you're running from the root directory of the project, use src/Test for the test directory. If you're already in the src directory, use just Test.
CUDA/GPU Issues: If you encounter CUDA-related errors, check that your GPU drivers are up-to-date and that PyTorch is installed with CUDA support. You can check this with:
```
import torch
print(torch.cuda.is_available())
```
Visualization Not Working: If visualizations aren't displaying properly, ensure matplotlib is correctly installed and that you're using a non-interactive backend when running in environments without a display:
```
import matplotlib
matplotlib.use('Agg')  # Use non-interactive backend
```

Citation 📝

If you find this code useful for your research, please consider citing:

@InProceedings{Rossi_2025_WACV,
    author    = {Rossi, Daniel and Borghi, Guido and Vezzani, Roberto},
    title     = {TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {February},
    year      = {2025},
    pages     = {376-385}
}

License 📜

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Summary of Terms

Attribution (BY): You must give appropriate credit to the original author(s), provide a link to the license, and indicate if changes were made.
NonCommercial (NC): This work may not be used for commercial purposes.
ShareAlike (SA): If you remix, transform, or build upon this work, you must distribute your contributions under the same license as the original.

For the full legal text of the license, please refer to https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.

Commercial Use

If you are interested in using this work for commercial purposes, please contact us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

ArXiv · CVF

Table of Contents 🔑

Introduction 🎙

Installation ⌨️

Docker setup 🚢

Repository setup 📂

Required Python Packages 📦

Usage 🧰

Configuration file parameters ⚙️:

Inference Scripts 🔮

Single Image Inference

Batch Image Inference

Inference on Edge Devices 🔋

Additional Information 🔍

Citation 📝

License 📜

Summary of Terms

Commercial Use

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

ArXiv · CVF

Table of Contents 🔑

Introduction 🎙

Installation ⌨️

Docker setup 🚢

Repository setup 📂

Required Python Packages 📦

Usage 🧰

Configuration file parameters ⚙️:

Inference Scripts 🔮

Single Image Inference

Batch Image Inference

Inference on Edge Devices 🔋

Additional Information 🔍

Citation 📝

License 📜

Summary of Terms

Commercial Use