Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ https://user-images.githubusercontent.com/26127866/147658296-23be4621-d37a-4fd6-

[![Gen2 WebRTC](https://user-images.githubusercontent.com/5244214/121884542-58a1bf00-cd13-11eb-851d-dc45d541e385.gif)](https://youtu.be/8aeqGgO8LjY)

## YOLO V3 V4 V5 X and P On-Camera ([here](https://github.com/luxonis/depthai-experiments/tree/master/gen2-yolo))
## YOLO V3 V4 V5 X P V8 V9 and 11 On-Camera ([here](https://github.com/luxonis/depthai-experiments/tree/master/gen2-yolo))

![yolo-logo](https://user-images.githubusercontent.com/56075061/144863247-fa819d1d-28d6-498a-89a8-c3f94d9e9357.gif)

3 changes: 2 additions & 1 deletion gen2-yolo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ This repository contains code for various Yolo experiments:
| Directory | Decoding | Version | Description |
| :-------------: | :------: | :--------------------------: | ------------------------------------------------------------ |
| `main.py` | device | From https://tools.luxonis.com | Run your custom trained YOLO model that was converted using the tools.luxonis.com. Uses [DepthAI-SDK](https://docs.luxonis.com/projects/sdk/en/latest/) |
| `device-decoding` | device | V3, V3-tiny, V4, V4-tiny, V5 | General object detection using any of the versions for which we support on-device decoding. Uses [DepthAI-API]https://docs.luxonis.com/projects/api/en/latest/) |
| `device-decoding` | device | V3, V3-tiny, V4, V4-tiny, V5 | General object detection using any of the versions for which we support on-device decoding. Uses [DepthAI-API](https://docs.luxonis.com/projects/api/en/latest/) |
| `car-detection` | device | V3-tiny, V4-tiny | Car detection using YoloV3-tiny and YoloV4-tiny with on-device decoding ([DepthAI-SDK](https://docs.luxonis.com/projects/sdk/en/latest/)). |
| `host-decoding` | host | V5 | Object detection using YoloV5 and on-host decoding. |
| `yolox` | host | X | Object detection without anchors using YOLOX-tiny with on-host decoding. |
| `yolop` | host | P | Vehicle detection, road segmentation, and lane segmentation using YOLOP on OAK with on-host decoding. |
| `yolo-segmentation` | host | V5, V8, V9, 11 | Object segmentation using YOLOv5, YOLOv8, YOLOv9, YOLO11 on OAK with on-host decoding. |

## On-Device decoding

Expand Down
84 changes: 84 additions & 0 deletions gen2-yolo/yolo-segmentation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# YOLO segmentation with decoding on host

This example shows how to perform instance segmentation on OAK devices using YOLO models ([YOLOv5](https://github.com/ultralytics/yolov5), [YOLOv8](https://docs.ultralytics.com/models/yolov8), [YOLOv9](https://github.com/WongKinYiu/yolov9) and [YOLO11](https://docs.ultralytics.com/models/yolo11)). The decoding of the models' output is done on the host side. The ONNX models were exported from pretrained weights on COCO and then were converted to blob to be compatible with OAK devices.

![On-host decoding YOLO segmentation in OAK](docs/oak_segmentation_example.gif)

## Pre-requisites

### 1. Install requirements

```bash=
python3 -m pip install -r requirements.txt
```

### 2. Convert/export models to ONNX format

You can either train your custom model or try directly with a model trained on the COCO dataset. The latter is the case handled in this experiment.

#### **YOLOv5**

**1. Installation:**
```bash=
git clone https://github.com/ultralytics/yolov5.git
cd yolov5/
pip install -r requirements.txt
```

**2. Export:**
```bash=
python3 export.py --weights yolov5{n,s}-seg.pt --include onnx --imgsz 640 --opset 16 --simplify
```

#### **YOLOv8** and **YOLO11**

**1. Installation:**
```bash=
pip install ultralytics
```

**2. Export:**
```bash=
yolo export model=yolov8{n,s}-seg.pt format=onnx imgsz=640 half=True dynamic=False simplify=True batch=1
yolo export model=yolo11{n,s}-seg.pt format=onnx imgsz=640 half=True dynamic=False simplify=True batch=1
```

#### **YOLOv9**

**1. Installation:**
```bash=
git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9/
pip install -r requirements.txt
```

**2. Download weights:**
```bash=
wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-c-seg.pt
```

**3. Export:**
```bash=
python3 export.py --weights gelan-c-seg.pt --include onnx --imgsz 640 --batch-size 1 --simplify
```

### 3. Convert ONNX models to blob

The conversion from ONNX to blob is made by means of Luxonis [Blob Converter Tool](http://blobconverter.luxonis.com). Note that the mean values of the ``Model optimizer parameters`` in the ``Advanced options`` must be changed from the default ``[127.5, 127.5, 127.5]`` to ``[0, 0, 0]``.


## Usage

#### Inference with YOLOv5, YOLO8, YOLOv9 or YOLO11

```bash=
python3 main.py --blob <path_to_blob_file> --conf <confidence_threshold> --iou <iou_threshold> --version <yolo_version>
```

Options:
* --blob: Path to YOLO blob file for inference. ``str``
* --conf: Set the confidence threshold. Default: 0.3. ``float``
* --iou: Set the NMS IoU threshold. Default: 0.5. ``float``
* --version: Set the YOLO version to consider. ``int``
* --input-shape: Set the input shape of YOLO model. Default: [640, 640]. ``int [int ...]``
* --fps: Set the FPS. Default: 30. ``int``
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
263 changes: 263 additions & 0 deletions gen2-yolo/yolo-segmentation/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
import sys
import cv2
import depthai as dai
import numpy as np
import argparse
import time
import random


def sigmoid(z):
return 1 / (1 + np.exp(-z))

def generate_random_color():
return [random.randint(0, 255) for _ in range(3)]


coco_classes = ['person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush']

class_colors = {cls: generate_random_color() for cls in coco_classes}

pixel_conf_threshold = 0.3
available_yolo_versions = [5, 8, 9, 11]


if __name__ == '__main__':

parser = argparse.ArgumentParser(description="Object segmentation on OAK devices using YOLO models.", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("--blob", help="Path to YOLO blob file for inference.", required=True, type=str)
parser.add_argument("--conf", help="Set the confidence threshold.", default=0.3, type=float)
parser.add_argument("--iou", help="Set the NMS IoU threshold.", default=0.5, type=float)
parser.add_argument("--version", help=f"Set the YOLO version to consider.", required=True, type=int, choices=available_yolo_versions)
parser.add_argument("--input-shape", help="Set the input shape of YOLO model.", nargs="+", default=[640,640], type=int)
parser.add_argument("--fps", help="Set the FPS.", default=30, type=int)
args = parser.parse_args()

nn_blob_path = args.blob
confidence_threshold = args.conf
iou_threshold = args.iou
yolo_version = args.version
yolo_input_shape = args.input_shape
fps = args.fps

# Expand input shape if needed
yolo_input_shape *= 2 if len(yolo_input_shape) == 1 else 1

# Start defining a pipeline
pipeline = dai.Pipeline()
pipeline.setOpenVINOVersion(version = dai.OpenVINO.VERSION_2022_1)

# Define a neural network that will make predictions based on the source frames
nn_node = pipeline.create(dai.node.NeuralNetwork)
nn_node.setNumPoolFrames(4)
nn_node.input.setBlocking(False)
nn_node.setNumInferenceThreads(2)
nn_node.setBlobPath(nn_blob_path)

# Define camera source
rgb_cam_node = pipeline.create(dai.node.ColorCamera)
rgb_cam_node.setBoardSocket(dai.CameraBoardSocket.CAM_A)
rgb_cam_node.setFps(fps)
rgb_cam_node.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
rgb_cam_node.setPreviewSize(yolo_input_shape)
rgb_cam_node.setInterleaved(False)
rgb_cam_node.preview.link(nn_node.input)

# Create outputs
xout_rgb = pipeline.create(dai.node.XLinkOut)
xout_rgb.setStreamName("rgb")
xout_rgb.input.setBlocking(False)
nn_node.passthrough.link(xout_rgb.input)

xout_nn = pipeline.create(dai.node.XLinkOut)
xout_nn.setStreamName("nn")
xout_nn.input.setBlocking(False)
nn_node.out.link(xout_nn.input)


# Pipeline defined, now the device is assigned and pipeline is started
with dai.Device() as device:
device.startPipeline(pipeline)

# Output queues will be used to get the rgb frames and nn data from the outputs defined above
q_rgb = device.getOutputQueue(name="rgb", maxSize=8, blocking=False)
q_nn = device.getOutputQueue(name="nn", maxSize=8, blocking=False)

start_time = time.monotonic()
counter = 0
fps = 0
layer_info_printed = False
while True:

rgb_out = q_rgb.get()
nn_out = q_nn.get()

frame = rgb_out.getCvFrame()
frame_height, frame_width, c = frame.shape

if nn_out is not None:
layers = nn_out.getAllLayers()

if not layer_info_printed:
print("+++ Output layer info +++\n")
for layer_nr, layer in enumerate(layers):
print(f"Layer {layer_nr}")
print(f"Name: {layer.name}")
print(f"Order: {layer.order}")
print(f"dataType: {layer.dataType}")
#dims = layer.dims[::-1] # reverse dimensions
print(f"dims: {layer.dims}\n")
layer_info_printed = True
print("+++++++++++++++++++++++++\n")

# Get output0 data to parse detected bounding boxes and get mask weights
output0 = nn_out.getLayerFp16(layers[0].name)
output0 = np.array(output0)
dims0 = layers[0].dims # (1, 116, 8400) for YOLOv8,v9,11 | (1, 25200, 117) for YOLOv5
output0 = output0.reshape(dims0)
output0 = output0.squeeze(0) # (116, 8400) for YOLOv8,v9,11 | (25200, 117) for YOLOv5
if yolo_version == 5:
output0 = output0.transpose() # (117, 25200) for YOLOv5

# Get output1 data to parse segmentation mask prototypes
output1 = nn_out.getLayerFp16(layers[1].name)
output1 = np.array(output1)
dims1 = layers[1].dims # (1, 32, 160, 160)
output1 = output1.reshape(dims1)
output1 = output1.squeeze(0) # (32, 160, 160)
mask_protos = output1.reshape(output1.shape[0], output1.shape[1]*output1.shape[2]) # (32, 25600)

# Get main info from output0
if yolo_version == 5:
num_classes = output0.shape[0] - 5 - 32 # number of classes = Total number of 2nd dimension - 5 bbox. info. - 32 mask weights
bounding_boxes = output0[:4, :] # bounding boxes coordinates format: (xc, yc, w, h)
box_confidences = output0[4, :] # bounding box confidence format: (conf)
class_confidences = output0[5:(num_classes+5), :] # class confidences format: (class_0, class_1, ...)
mask_weights = output0[(num_classes+5):, :] # mask weights format: (mask_weight_0, mask_weight_1, ... , mask_weight_31)
else:
num_classes = output0.shape[0] - 4 - 32 # number of classes = Total number of 2nd dimension - 4 bbox. coord. - 32 mask weights
bounding_boxes = output0[:4, :] # bounding boxes coordinates format: (xc, yc, w, h)
class_confidences = output0[4:(num_classes+4), :] # class confidences format: (class_0, class_1, ...)
mask_weights = output0[(num_classes+4):, :] # mask weights format: (mask_weight_0, mask_weight_1, ... , mask_weight_31)

class_scores = np.max(class_confidences, axis=0)
class_ids = np.argmax(class_confidences, axis=0)

# Initial filtering based on class scores
if yolo_version == 5:
filtered_indices = box_confidences > 0.0
filtered_box_scores = box_confidences[filtered_indices]
else:
filtered_indices = class_scores > 0.0
filtered_boxes = bounding_boxes[:, filtered_indices]
filtered_class_scores = class_scores[filtered_indices]
filtered_class_ids = class_ids[filtered_indices]
filtered_mask_weights = mask_weights[:, filtered_indices]

# Format bounding box coordinates
x_center = filtered_boxes[0, :]
y_center = filtered_boxes[1, :]
box_width = filtered_boxes[2, :]
box_height = filtered_boxes[3, :]

x1 = x_center - box_width / 2.
y1 = y_center - box_height / 2.
x2 = x_center + box_width / 2.
y2 = y_center + box_height / 2.

# Apply NMS
bboxes = np.stack([x1, y1, x2, y2], axis=1)
if yolo_version == 5:
indices = cv2.dnn.NMSBoxes(bboxes.tolist(), filtered_box_scores.tolist(), confidence_threshold, iou_threshold)
else:
indices = cv2.dnn.NMSBoxes(bboxes.tolist(), filtered_class_scores.tolist(), confidence_threshold, iou_threshold)

final_boxes = [[int(v) for v in bboxes[i]] for i in indices]
final_scores = [filtered_class_scores[i] for i in indices]
final_class_ids = [filtered_class_ids[i] for i in indices]
filtered_mask_weights_t = filtered_mask_weights.transpose()
filtered_mask_weights = np.asarray([filtered_mask_weights_t[i] for i in indices]) # (N, 32)

if filtered_mask_weights.shape[0] != 0:
final_masks = filtered_mask_weights @ mask_protos # matrix multiplication

for i in range(len(final_boxes)):
# Get bounding box data
x1_i, y1_i, x2_i, y2_i = final_boxes[i]
score = final_scores[i]
class_id = final_class_ids[i]
class_name = coco_classes[class_id]

# Clamp coordinates
x1_i = np.clip(x1_i, 0, frame_width - 1)
y1_i = np.clip(y1_i, 0, frame_height - 1)
x2_i = np.clip(x2_i, 0, frame_width - 1)
y2_i = np.clip(y2_i, 0, frame_height - 1)

# Draw bounding box
cv2.rectangle(frame, (x1_i, y1_i), (x2_i, y2_i), color=class_colors[class_name], thickness=1)

# Get mask data
mask_ph, mask_pw = output1.shape[1:]
mask = final_masks[i].reshape(mask_ph, mask_pw)
mask = sigmoid(mask)
mask = (mask > pixel_conf_threshold).astype('uint8') * 255
mask_x1 = round(x1_i / frame_width * mask_pw)
mask_y1 = round(y1_i / frame_height * mask_ph)
mask_x2 = round(x2_i / frame_width * mask_pw)
mask_y2 = round(y2_i / frame_height * mask_ph)
mask = mask[mask_y1:mask_y2, mask_x1:mask_x2]
mask_img = mask.astype(np.uint8)

if mask_img.shape:
mask_img = cv2.resize(mask_img, (x2_i-x1_i,y2_i-y1_i), interpolation=cv2.INTER_LINEAR)
mask = np.array(mask_img)

# Get polygon data
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
polygon = np.array([[x1_i + c[0][0], y1_i + c[0][1]] for c in contours[0]], np.int32)

# Fill polygon
overlay = np.zeros_like(frame, dtype=np.uint8)
#overlay[y1_i:y2_i, x1_i:x2_i][mask_img == 255] = (255, 255, 0)
cv2.fillPoly(overlay, [polygon], color=class_colors[class_name])
cv2.addWeighted(frame, 1.0, overlay, 0.5, 0, frame)

# Draw polygon
cv2.polylines(frame, [polygon], isClosed=True, color=(0, 0, 0), thickness=2)
#cv2.drawContours(frame, [polygon], -1, (0,0,0), 2)

# Draw detection label (class + confidence)
label = f"{class_name}: {score:.2f}"
cv2.putText(frame, label, (x1_i, y1_i - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1)

# Draw fps on frame
cv2.putText(frame, "NN fps: {:.1f}".format(fps), (2, frame_height - 4), cv2.FONT_HERSHEY_DUPLEX, 0.6, (0, 0, 255), 1)

# Show frame
cv2.imshow('res', frame)

# Compute fps
counter += 1
current_time = time.monotonic()
if (current_time - start_time) > 1:
fps = counter / (current_time - start_time)
counter = 0
start_time = current_time

if cv2.waitKey(1) == ord('q'):
cv2.destroyAllWindows()
break
6 changes: 6 additions & 0 deletions gen2-yolo/yolo-segmentation/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
depthai==2.28.0.0
opencv-python>=4.10.0
numpy>=1.24.4
argparse
time
random