-
Notifications
You must be signed in to change notification settings - Fork 381
On-host decoding of YOLO segmentation in OAK #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
153b0dc
On-host decoding of YOLO segmentation in OAK
a19789c
Added example GIF
eb495a3
Updated README & Using single main.py for all YOLOs
49be777
Update README
7f15d4d
Update README
49ae7ea
Update READMEs
ec03c36
Update README & Added new arguments for fps and input shape
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| # YOLO segmentation with decoding on host | ||
|
|
||
| This example shows how to perform instance segmentation on OAK devices using YOLO models ([YOLOv5](https://github.com/ultralytics/yolov5), [YOLOv8](https://docs.ultralytics.com/models/yolov8), [YOLOv9](https://github.com/WongKinYiu/yolov9) and [YOLO11](https://docs.ultralytics.com/models/yolo11)). The decoding of the models' output is done on the host side. The ONNX models were exported from pretrained weights on COCO and then were converted to blob to be compatible with OAK devices. | ||
|
|
||
|  | ||
|
|
||
| ## Pre-requisites | ||
|
|
||
| ### 1. Install requirements | ||
|
|
||
| ```bash= | ||
| python3 -m pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ### 2. Convert/export models to ONNX format | ||
|
|
||
| You can either train your custom model or try directly with a model trained on the COCO dataset. The latter is the case handled in this experiment. | ||
|
|
||
| #### **YOLOv5** | ||
|
|
||
| **1. Installation:** | ||
| ```bash= | ||
| git clone https://github.com/ultralytics/yolov5.git | ||
| cd yolov5/ | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| **2. Export:** | ||
| ```bash= | ||
| python3 export.py --weights yolov5{n,s}-seg.pt --include onnx --imgsz 640 --opset 16 --simplify | ||
| ``` | ||
|
|
||
| #### **YOLOv8** and **YOLO11** | ||
|
|
||
| **1. Installation:** | ||
| ```bash= | ||
| pip install ultralytics | ||
| ``` | ||
|
|
||
| **2. Export:** | ||
| ```bash= | ||
| yolo export model=yolov8{n,s}-seg.pt format=onnx imgsz=640 half=True dynamic=False simplify=True batch=1 | ||
| yolo export model=yolo11{n,s}-seg.pt format=onnx imgsz=640 half=True dynamic=False simplify=True batch=1 | ||
| ``` | ||
|
|
||
| #### **YOLOv9** | ||
ptoupas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| **1. Installation:** | ||
| ```bash= | ||
| git clone https://github.com/WongKinYiu/yolov9.git | ||
| cd yolov9/ | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| **2. Download weights:** | ||
| ```bash= | ||
| wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-c-seg.pt | ||
| ``` | ||
|
|
||
| **3. Export:** | ||
| ```bash= | ||
| python3 export.py --weights gelan-c-seg.pt --include onnx --imgsz 640 --batch-size 1 --simplify | ||
ptoupas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ### 3. Convert ONNX models to blob | ||
|
|
||
| The conversion from ONNX to blob is made by means of Luxonis [Blob Converter Tool](http://blobconverter.luxonis.com). Note that the mean values of the ``Model optimizer parameters`` in the ``Advanced options`` must be changed from the default ``[127.5, 127.5, 127.5]`` to ``[0, 0, 0]``. | ||
|
|
||
|
|
||
| ## Usage | ||
|
|
||
| #### Inference with YOLOv5, YOLO8, YOLOv9 or YOLO11 | ||
|
|
||
| ```bash= | ||
| python3 main.py --blob <path_to_blob_file> --conf <confidence_threshold> --iou <iou_threshold> --version <yolo_version> | ||
| ``` | ||
|
|
||
ptoupas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Options: | ||
| * --blob: Path to YOLO blob file for inference. ``str`` | ||
| * --conf: Set the confidence threshold. Default: 0.3. ``float`` | ||
| * --iou: Set the NMS IoU threshold. Default: 0.5. ``float`` | ||
| * --version: Set the YOLO version to consider. ``int`` | ||
| * --input-shape: Set the input shape of YOLO model. Default: [640, 640]. ``int [int ...]`` | ||
| * --fps: Set the FPS. Default: 30. ``int`` | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,263 @@ | ||
| import sys | ||
| import cv2 | ||
| import depthai as dai | ||
| import numpy as np | ||
| import argparse | ||
| import time | ||
| import random | ||
|
|
||
|
|
||
| def sigmoid(z): | ||
| return 1 / (1 + np.exp(-z)) | ||
|
|
||
| def generate_random_color(): | ||
| return [random.randint(0, 255) for _ in range(3)] | ||
|
|
||
|
|
||
| coco_classes = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', | ||
| 'bus', 'train', 'truck', 'boat', 'traffic light', | ||
| 'fire hydrant', 'stop sign', 'parking meter', 'bench', | ||
| 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', | ||
| 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', | ||
| 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', | ||
| 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', | ||
| 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', | ||
| 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', | ||
| 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', | ||
| 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', | ||
| 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', | ||
| 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', | ||
| 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] | ||
|
|
||
| class_colors = {cls: generate_random_color() for cls in coco_classes} | ||
|
|
||
| pixel_conf_threshold = 0.3 | ||
| available_yolo_versions = [5, 8, 9, 11] | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
|
|
||
| parser = argparse.ArgumentParser(description="Object segmentation on OAK devices using YOLO models.", formatter_class=argparse.ArgumentDefaultsHelpFormatter) | ||
| parser.add_argument("--blob", help="Path to YOLO blob file for inference.", required=True, type=str) | ||
| parser.add_argument("--conf", help="Set the confidence threshold.", default=0.3, type=float) | ||
| parser.add_argument("--iou", help="Set the NMS IoU threshold.", default=0.5, type=float) | ||
| parser.add_argument("--version", help=f"Set the YOLO version to consider.", required=True, type=int, choices=available_yolo_versions) | ||
| parser.add_argument("--input-shape", help="Set the input shape of YOLO model.", nargs="+", default=[640,640], type=int) | ||
| parser.add_argument("--fps", help="Set the FPS.", default=30, type=int) | ||
| args = parser.parse_args() | ||
|
|
||
| nn_blob_path = args.blob | ||
| confidence_threshold = args.conf | ||
| iou_threshold = args.iou | ||
| yolo_version = args.version | ||
| yolo_input_shape = args.input_shape | ||
| fps = args.fps | ||
|
|
||
| # Expand input shape if needed | ||
| yolo_input_shape *= 2 if len(yolo_input_shape) == 1 else 1 | ||
|
|
||
| # Start defining a pipeline | ||
| pipeline = dai.Pipeline() | ||
| pipeline.setOpenVINOVersion(version = dai.OpenVINO.VERSION_2022_1) | ||
|
|
||
| # Define a neural network that will make predictions based on the source frames | ||
| nn_node = pipeline.create(dai.node.NeuralNetwork) | ||
| nn_node.setNumPoolFrames(4) | ||
| nn_node.input.setBlocking(False) | ||
| nn_node.setNumInferenceThreads(2) | ||
| nn_node.setBlobPath(nn_blob_path) | ||
|
|
||
| # Define camera source | ||
| rgb_cam_node = pipeline.create(dai.node.ColorCamera) | ||
| rgb_cam_node.setBoardSocket(dai.CameraBoardSocket.CAM_A) | ||
| rgb_cam_node.setFps(fps) | ||
| rgb_cam_node.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P) | ||
| rgb_cam_node.setPreviewSize(yolo_input_shape) | ||
| rgb_cam_node.setInterleaved(False) | ||
| rgb_cam_node.preview.link(nn_node.input) | ||
|
|
||
| # Create outputs | ||
| xout_rgb = pipeline.create(dai.node.XLinkOut) | ||
| xout_rgb.setStreamName("rgb") | ||
| xout_rgb.input.setBlocking(False) | ||
| nn_node.passthrough.link(xout_rgb.input) | ||
|
|
||
| xout_nn = pipeline.create(dai.node.XLinkOut) | ||
| xout_nn.setStreamName("nn") | ||
| xout_nn.input.setBlocking(False) | ||
| nn_node.out.link(xout_nn.input) | ||
|
|
||
|
|
||
| # Pipeline defined, now the device is assigned and pipeline is started | ||
| with dai.Device() as device: | ||
| device.startPipeline(pipeline) | ||
|
|
||
| # Output queues will be used to get the rgb frames and nn data from the outputs defined above | ||
| q_rgb = device.getOutputQueue(name="rgb", maxSize=8, blocking=False) | ||
| q_nn = device.getOutputQueue(name="nn", maxSize=8, blocking=False) | ||
|
|
||
| start_time = time.monotonic() | ||
| counter = 0 | ||
| fps = 0 | ||
| layer_info_printed = False | ||
| while True: | ||
|
|
||
| rgb_out = q_rgb.get() | ||
| nn_out = q_nn.get() | ||
|
|
||
| frame = rgb_out.getCvFrame() | ||
| frame_height, frame_width, c = frame.shape | ||
|
|
||
| if nn_out is not None: | ||
| layers = nn_out.getAllLayers() | ||
|
|
||
| if not layer_info_printed: | ||
| print("+++ Output layer info +++\n") | ||
| for layer_nr, layer in enumerate(layers): | ||
| print(f"Layer {layer_nr}") | ||
| print(f"Name: {layer.name}") | ||
| print(f"Order: {layer.order}") | ||
| print(f"dataType: {layer.dataType}") | ||
| #dims = layer.dims[::-1] # reverse dimensions | ||
| print(f"dims: {layer.dims}\n") | ||
| layer_info_printed = True | ||
| print("+++++++++++++++++++++++++\n") | ||
|
|
||
| # Get output0 data to parse detected bounding boxes and get mask weights | ||
| output0 = nn_out.getLayerFp16(layers[0].name) | ||
| output0 = np.array(output0) | ||
| dims0 = layers[0].dims # (1, 116, 8400) for YOLOv8,v9,11 | (1, 25200, 117) for YOLOv5 | ||
| output0 = output0.reshape(dims0) | ||
| output0 = output0.squeeze(0) # (116, 8400) for YOLOv8,v9,11 | (25200, 117) for YOLOv5 | ||
| if yolo_version == 5: | ||
| output0 = output0.transpose() # (117, 25200) for YOLOv5 | ||
|
|
||
| # Get output1 data to parse segmentation mask prototypes | ||
| output1 = nn_out.getLayerFp16(layers[1].name) | ||
| output1 = np.array(output1) | ||
| dims1 = layers[1].dims # (1, 32, 160, 160) | ||
| output1 = output1.reshape(dims1) | ||
| output1 = output1.squeeze(0) # (32, 160, 160) | ||
| mask_protos = output1.reshape(output1.shape[0], output1.shape[1]*output1.shape[2]) # (32, 25600) | ||
|
|
||
| # Get main info from output0 | ||
| if yolo_version == 5: | ||
| num_classes = output0.shape[0] - 5 - 32 # number of classes = Total number of 2nd dimension - 5 bbox. info. - 32 mask weights | ||
| bounding_boxes = output0[:4, :] # bounding boxes coordinates format: (xc, yc, w, h) | ||
| box_confidences = output0[4, :] # bounding box confidence format: (conf) | ||
| class_confidences = output0[5:(num_classes+5), :] # class confidences format: (class_0, class_1, ...) | ||
| mask_weights = output0[(num_classes+5):, :] # mask weights format: (mask_weight_0, mask_weight_1, ... , mask_weight_31) | ||
| else: | ||
| num_classes = output0.shape[0] - 4 - 32 # number of classes = Total number of 2nd dimension - 4 bbox. coord. - 32 mask weights | ||
| bounding_boxes = output0[:4, :] # bounding boxes coordinates format: (xc, yc, w, h) | ||
| class_confidences = output0[4:(num_classes+4), :] # class confidences format: (class_0, class_1, ...) | ||
| mask_weights = output0[(num_classes+4):, :] # mask weights format: (mask_weight_0, mask_weight_1, ... , mask_weight_31) | ||
|
|
||
| class_scores = np.max(class_confidences, axis=0) | ||
| class_ids = np.argmax(class_confidences, axis=0) | ||
|
|
||
| # Initial filtering based on class scores | ||
| if yolo_version == 5: | ||
| filtered_indices = box_confidences > 0.0 | ||
| filtered_box_scores = box_confidences[filtered_indices] | ||
| else: | ||
| filtered_indices = class_scores > 0.0 | ||
| filtered_boxes = bounding_boxes[:, filtered_indices] | ||
| filtered_class_scores = class_scores[filtered_indices] | ||
| filtered_class_ids = class_ids[filtered_indices] | ||
| filtered_mask_weights = mask_weights[:, filtered_indices] | ||
|
|
||
| # Format bounding box coordinates | ||
| x_center = filtered_boxes[0, :] | ||
| y_center = filtered_boxes[1, :] | ||
| box_width = filtered_boxes[2, :] | ||
| box_height = filtered_boxes[3, :] | ||
|
|
||
| x1 = x_center - box_width / 2. | ||
| y1 = y_center - box_height / 2. | ||
| x2 = x_center + box_width / 2. | ||
| y2 = y_center + box_height / 2. | ||
|
|
||
| # Apply NMS | ||
| bboxes = np.stack([x1, y1, x2, y2], axis=1) | ||
| if yolo_version == 5: | ||
| indices = cv2.dnn.NMSBoxes(bboxes.tolist(), filtered_box_scores.tolist(), confidence_threshold, iou_threshold) | ||
| else: | ||
| indices = cv2.dnn.NMSBoxes(bboxes.tolist(), filtered_class_scores.tolist(), confidence_threshold, iou_threshold) | ||
|
|
||
| final_boxes = [[int(v) for v in bboxes[i]] for i in indices] | ||
| final_scores = [filtered_class_scores[i] for i in indices] | ||
| final_class_ids = [filtered_class_ids[i] for i in indices] | ||
| filtered_mask_weights_t = filtered_mask_weights.transpose() | ||
| filtered_mask_weights = np.asarray([filtered_mask_weights_t[i] for i in indices]) # (N, 32) | ||
|
|
||
| if filtered_mask_weights.shape[0] != 0: | ||
| final_masks = filtered_mask_weights @ mask_protos # matrix multiplication | ||
|
|
||
| for i in range(len(final_boxes)): | ||
| # Get bounding box data | ||
| x1_i, y1_i, x2_i, y2_i = final_boxes[i] | ||
| score = final_scores[i] | ||
| class_id = final_class_ids[i] | ||
| class_name = coco_classes[class_id] | ||
|
|
||
| # Clamp coordinates | ||
| x1_i = np.clip(x1_i, 0, frame_width - 1) | ||
| y1_i = np.clip(y1_i, 0, frame_height - 1) | ||
| x2_i = np.clip(x2_i, 0, frame_width - 1) | ||
| y2_i = np.clip(y2_i, 0, frame_height - 1) | ||
|
|
||
| # Draw bounding box | ||
| cv2.rectangle(frame, (x1_i, y1_i), (x2_i, y2_i), color=class_colors[class_name], thickness=1) | ||
|
|
||
| # Get mask data | ||
| mask_ph, mask_pw = output1.shape[1:] | ||
| mask = final_masks[i].reshape(mask_ph, mask_pw) | ||
| mask = sigmoid(mask) | ||
| mask = (mask > pixel_conf_threshold).astype('uint8') * 255 | ||
| mask_x1 = round(x1_i / frame_width * mask_pw) | ||
| mask_y1 = round(y1_i / frame_height * mask_ph) | ||
| mask_x2 = round(x2_i / frame_width * mask_pw) | ||
| mask_y2 = round(y2_i / frame_height * mask_ph) | ||
| mask = mask[mask_y1:mask_y2, mask_x1:mask_x2] | ||
| mask_img = mask.astype(np.uint8) | ||
|
|
||
| if mask_img.shape: | ||
| mask_img = cv2.resize(mask_img, (x2_i-x1_i,y2_i-y1_i), interpolation=cv2.INTER_LINEAR) | ||
| mask = np.array(mask_img) | ||
|
|
||
| # Get polygon data | ||
| contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) | ||
| polygon = np.array([[x1_i + c[0][0], y1_i + c[0][1]] for c in contours[0]], np.int32) | ||
|
|
||
| # Fill polygon | ||
| overlay = np.zeros_like(frame, dtype=np.uint8) | ||
| #overlay[y1_i:y2_i, x1_i:x2_i][mask_img == 255] = (255, 255, 0) | ||
| cv2.fillPoly(overlay, [polygon], color=class_colors[class_name]) | ||
| cv2.addWeighted(frame, 1.0, overlay, 0.5, 0, frame) | ||
|
|
||
| # Draw polygon | ||
| cv2.polylines(frame, [polygon], isClosed=True, color=(0, 0, 0), thickness=2) | ||
| #cv2.drawContours(frame, [polygon], -1, (0,0,0), 2) | ||
|
|
||
| # Draw detection label (class + confidence) | ||
| label = f"{class_name}: {score:.2f}" | ||
| cv2.putText(frame, label, (x1_i, y1_i - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1) | ||
|
|
||
| # Draw fps on frame | ||
| cv2.putText(frame, "NN fps: {:.1f}".format(fps), (2, frame_height - 4), cv2.FONT_HERSHEY_DUPLEX, 0.6, (0, 0, 255), 1) | ||
|
|
||
| # Show frame | ||
| cv2.imshow('res', frame) | ||
|
|
||
| # Compute fps | ||
| counter += 1 | ||
| current_time = time.monotonic() | ||
| if (current_time - start_time) > 1: | ||
| fps = counter / (current_time - start_time) | ||
| counter = 0 | ||
| start_time = current_time | ||
|
|
||
| if cv2.waitKey(1) == ord('q'): | ||
| cv2.destroyAllWindows() | ||
| break |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| depthai==2.28.0.0 | ||
| opencv-python>=4.10.0 | ||
| numpy>=1.24.4 | ||
| argparse | ||
| time | ||
| random |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.