diff --git a/README.md b/README.md index 34634aeaf..aa8284e2b 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,12 @@ If you would like to try on your computer: |------------:|:------------:|:------------:|:------------:|:------------:| | [rvc](/audio_processing/rvc/) | [Retrieval-based-Voice-Conversion-WebUI](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) | Pytorch | 1.2.12 and later | [JP](https://medium.com/axinc/rvc-ai%E3%82%92%E4%BD%BF%E7%94%A8%E3%81%97%E3%81%9F%E3%83%9C%E3%82%A4%E3%82%B9%E3%83%81%E3%82%A7%E3%83%B3%E3%82%B8%E3%83%A3%E3%83%BC-64a813c7a0c4) | +## Autonomous Driving + +| Model | Reference | Exported From | Supported Ailia Version | Blog | +|------------:|:------------:|:------------:|:------------:|:------------:| +|[uniad](/autonomous_driving/uniad/) | [UniAD: Unified Driving](https://github.com/OpenDriveLab/UniAD) | Pytorch | 1.6.1 and later | | + ## Background removal | | Model | Reference | Exported From | Supported Ailia Version | Blog | diff --git a/autonomous_driving/uniad/LICENSE b/autonomous_driving/uniad/LICENSE new file mode 100644 index 000000000..261eeb9e9 --- /dev/null +++ b/autonomous_driving/uniad/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/autonomous_driving/uniad/README.md b/autonomous_driving/uniad/README.md new file mode 100644 index 000000000..3ccb4d22c --- /dev/null +++ b/autonomous_driving/uniad/README.md @@ -0,0 +1,161 @@ +# UniAD + +Unified autonomous driving inference and visualization pipeline on NuScenes. Uses ONNX Runtime or ailia to run BEV encoder, tracking, motion, occupancy, and planning heads; streams per-frame results to HDF5, then visualizes to images and video. + +## Requirements + +### Dependencies + +Install the required Python packages: + +```bash +pip3 install h5py +pip3 install casadi==3.6.7 +pip3 install nuscenes-devkit==1.1.11 +pip3 install pyquaternion==0.9.9 +``` + +### Dataset Setup + +Download the nuScenes V1.0 full dataset, CAN bus extension from [HERE](https://www.nuscenes.org/download), then place the dataset files under `data/nuscenes/`. + +Download nuScenes, CAN_bus and Map extensions: + +```shell +cd uniad +mkdir data +cd uniad/data +mkdir nuscenes && cd nuscenes +# Download nuScenes V1.0 full dataset data directly to (or soft link to) uniad/data/nuscenes/ +# Download CAN_bus directly to (or soft link to) uniad/data/nuscenes/ +``` + +Prepare UniAD data info (Optional): + +```shell +cd uniad/data +mkdir infos && cd infos +wget https://github.com/OpenDriveLab/UniAD/releases/download/v1.0/nuscenes_infos_temporal_train.pkl # train_infos +wget https://github.com/OpenDriveLab/UniAD/releases/download/v1.0/nuscenes_infos_temporal_val.pkl # val_infos +``` + +Directory structure: + +``` +uniad +├── data/ +│ ├── nuscenes/ +│ │ ├── can_bus/ +│ │ ├── maps/ +│ │ ├── samples/ +│ │ ├── sweeps/ +│ │ ├── v1.0-test/ +│ │ ├── v1.0-trainval/ +│ ├── infos/ +│ │ ├── nuscenes_infos_temporal_train.pkl +│ │ ├── nuscenes_infos_temporal_val.pkl +``` + +## Usage + +Automatically downloads the onnx and prototxt files on the first run. Internet connection is required for the initial download. + +For the sample image, +```bash +python3 uniad.py +``` + +This will process scenes `scene-0102` and `scene-0103` from `v1.0-trainval`, then visualize `scene-0103`. + +### Workflow + +The typical workflow consists of two steps: + +1. Run model inference on specified scenes and save results to HDF5 file +2. Load HDF5 results and generate visualization images/videos + +### Step 1: Run Inference + +Run model inference and save results to HDF5: + +```bash +python3 uniad.py --results_file all_frames.h5 \ + --data_root data/nuscenes/ \ + --version v1.0-trainval \ + --scenes scene-0102 --scenes scene-0103 +``` + +Parameters: + +- `--data_root`: Path to NuScenes dataset root (default: `data/nuscenes/`) +- `--version`: Dataset version: `v1.0-trainval`, `v1.0-test`, or `v1.0-mini` (default: `v1.0-trainval`) +- `--scenes`: Scene name(s) to process, repeatable (default: `scene-0102`, `scene-0103`) +- `--results_file`: Output HDF5 file path (default: `all_frames.h5`) + +Process multiple scenes: + +```bash +python3 uniad.py --results_file my_results.h5 \ + --scenes scene-0655 --scenes scene-0796 --scenes scene-0916 +``` + +### Step 2: Visualize Results + +Load HDF5 file and generate visualizations: + +```bash +python3 uniad.py --visualize all_frames.h5 \ + --data_root data/nuscenes/ \ + --version v1.0-trainval \ + --vis_scenes scene-0103 +``` + +This skips model inference and loads results directly from HDF5. Output includes: +- Visualization images (JPG) in `vis_output/` directory +- Video file as `output.avi` + +Parameters: + +- `--visualize`: Path to HDF5 file (required) +- `--vis_scenes`: Scene(s) to visualize, repeatable (default: all scenes in HDF5) +- `--data_root`: Path to NuScenes dataset root (default: `data/nuscenes/`) +- `--version`: Dataset version (default: `v1.0-trainval`) + +Note: Frame rate is set to 4 fps and downsample factor is 2 (hardcoded). + +Visualize specific scene: + +```bash +python3 uniad.py --visualize all_frames.h5 --vis_scenes scene-0102 +``` + +Visualize all scenes: + +```bash +python3 uniad.py --visualize all_frames.h5 +``` + +### Available Scenes + +v1.0-trainval (850 scenes): +- Training: `scene-0001` to `scene-0700` (700 scenes) +- Validation: `scene-0701` to `scene-0850` (150 scenes) +- Recommended for testing: `scene-0102`, `scene-0103`, `scene-0655`, `scene-0796`, `scene-0916` + +v1.0-test (150 scenes): +- `scene-0001` to `scene-0150` + +v1.0-mini (10 scenes, for quick testing): +- `scene-0061`, `scene-0103`, `scene-0553`, `scene-0655`, `scene-0757`, `scene-0796`, `scene-0916`, `scene-1077`, `scene-1094`, `scene-1100` + +## HDF5 Output Format + +The HDF5 file contains per-frame results organized as sequential groups: + +- Groups: `frame_000000`, `frame_000001`, ... +- Attributes: `token`, `scene_token`, `command` +- Datasets: `boxes_3d`, `scores_3d`, `labels_3d`, `track_ids`, `traj`, `traj_scores`, `sdc_boxes_3d`, `sdc_scores_3d`, `planning_traj` + +## Reference + +- [UniAD: Unified Driving](https://github.com/OpenDriveLab/UniAD) diff --git a/autonomous_driving/uniad/collision_optimization.py b/autonomous_driving/uniad/collision_optimization.py new file mode 100644 index 000000000..6a7926405 --- /dev/null +++ b/autonomous_driving/uniad/collision_optimization.py @@ -0,0 +1,118 @@ +# ---------------------------------------------------------------------------------# +# UniAD: Planning-oriented Autonomous Driving (https://arxiv.org/abs/2212.10156) # +# Source code: https://github.com/OpenDriveLab/UniAD # +# Copyright (c) OpenDriveLab. All rights reserved. # +# ---------------------------------------------------------------------------------# + +from typing import Sequence, Tuple + +import numpy as np +import numpy.typing as npt +from casadi import DM, Opti, OptiSol, exp, sumsqr, vertcat + +Pose = Tuple[float, float, float] # (x, y, yaw) + + +class CollisionNonlinearOptimizer: + """ + Optimize planned trajectory with predicted occupancy + Solved with direct multiple-shooting. + modified from https://github.com/motional/nuplan-devkit + :param trajectory_len: trajectory length + :param dt: timestep (sec) + """ + + def __init__( + self, trajectory_len: int, dt: float, sigma, alpha_collision, obj_pixel_pos + ): + """ + :param trajectory_len: the length of trajectory to be optimized. + :param dt: the time interval between trajectory points. + """ + self.dt = dt + self.trajectory_len = trajectory_len + self.current_index = 0 + self.sigma = sigma + self.alpha_collision = alpha_collision + self.obj_pixel_pos = obj_pixel_pos + # Use a array of dts to make it compatible to situations with varying dts across different time steps. + self._dts: npt.NDArray[np.float32] = np.asarray([[dt] * trajectory_len]) + self._init_optimization() + + def _init_optimization(self) -> None: + """ + Initialize related variables and constraints for optimization. + """ + self.nx = 2 # state dim + + self._optimizer = Opti() # Optimization problem + self._create_decision_variables() + self._create_parameters() + self._set_objective() + + # Set default solver options (quiet) + self._optimizer.solver( + "ipopt", {"ipopt.print_level": 0, "print_time": 0, "ipopt.sb": "yes"} + ) + + def set_reference_trajectory(self, reference_trajectory: Sequence[Pose]) -> None: + """ + Set the reference trajectory that the smoother is trying to loosely track. + :param x_curr: current state of size nx (x, y) + :param reference_trajectory: N x 3 reference, where the second dim is for (x, y) + """ + self._optimizer.set_value(self.ref_traj, DM(reference_trajectory).T) + self._set_initial_guess(reference_trajectory) + + def solve(self) -> OptiSol: + """ + Solve the optimization problem. Assumes the reference trajectory was already set. + :return Casadi optimization class + """ + return self._optimizer.solve() + + def _create_decision_variables(self) -> None: + """ + Define the decision variables for the trajectory optimization. + """ + # State trajectory (x, y) + self.state = self._optimizer.variable(self.nx, self.trajectory_len) + self.position_x = self.state[0, :] + self.position_y = self.state[1, :] + + def _create_parameters(self) -> None: + """ + Define the expert trjactory and current position for the trajectory optimizaiton. + """ + self.ref_traj = self._optimizer.parameter(2, self.trajectory_len) # (x, y) + + def _set_objective(self) -> None: + """Set the objective function. Use care when modifying these weights.""" + # Follow reference, minimize control rates and absolute inputs + alpha_xy = 1.0 + cost_stage = alpha_xy * sumsqr( + self.ref_traj[:2, :] - vertcat(self.position_x, self.position_y) + ) + + alpha_collision = self.alpha_collision + + cost_collision = 0 + normalizer = 1 / (2.507 * self.sigma) + # TODO: vectorize this + for t in range(len(self.obj_pixel_pos)): + x, y = self.position_x[t], self.position_y[t] + for i in range(len(self.obj_pixel_pos[t])): + col_x, col_y = self.obj_pixel_pos[t][i] + cost_collision += ( + alpha_collision + * normalizer + * exp(-((x - col_x) ** 2 + (y - col_y) ** 2) / 2 / self.sigma**2) + ) + self._optimizer.minimize(cost_stage + cost_collision) + + def _set_initial_guess(self, reference_trajectory: Sequence[Pose]) -> None: + """Set a warm-start for the solver based on the reference trajectory.""" + # Initialize state guess based on reference + self._optimizer.set_initial( + self.state[:2, :], DM(reference_trajectory).T + ) # (x, y, yaw) diff --git a/autonomous_driving/uniad/lidar_box3d.py b/autonomous_driving/uniad/lidar_box3d.py new file mode 100644 index 000000000..4c3a37e0c --- /dev/null +++ b/autonomous_driving/uniad/lidar_box3d.py @@ -0,0 +1,153 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from abc import abstractmethod + +import numpy as np + + +class BaseInstance3DBoxes(object): + """Base class for 3D Boxes. + + Note: + The box is bottom centered, i.e. the relative position of origin in + the box is (0.5, 0.5, 0). + + Args: + tensor (torch.Tensor | np.ndarray | list): a N x box_dim matrix. + box_dim (int): Number of the dimension of a box. + Each row is (x, y, z, x_size, y_size, z_size, yaw). + Defaults to 7. + with_yaw (bool): Whether the box is with yaw rotation. + If False, the value of yaw will be set to 0 as minmax boxes. + Defaults to True. + origin (tuple[float], optional): Relative position of the box origin. + Defaults to (0.5, 0.5, 0). This will guide the box be converted to + (0.5, 0.5, 0) mode. + + Attributes: + tensor (torch.Tensor): Float matrix of N x box_dim. + box_dim (int): Integer indicating the dimension of a box. + Each row is (x, y, z, x_size, y_size, z_size, yaw, ...). + with_yaw (bool): If True, the value of yaw will be set to 0 as minmax + boxes. + """ + + def __init__(self, tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0)): + tensor = np.asarray(tensor, dtype=np.float32) + if tensor.size == 0: + tensor = tensor.reshape((0, box_dim)).astype(np.float32) + # assert tensor.ndim == 2 and tensor.shape[-1] == box_dim, tensor.shape + + if tensor.shape[-1] == 6: + # If the dimension of boxes is 6, we expand box_dim by padding + # 0 as a fake yaw and set with_yaw to False. + assert box_dim == 6 + fake_rot = np.zeros((tensor.shape[0], 1), dtype=tensor.dtype) + tensor = np.concatenate((tensor, fake_rot), axis=-1) + self.box_dim = box_dim + 1 + self.with_yaw = False + else: + self.box_dim = box_dim + self.with_yaw = with_yaw + self.tensor = tensor.copy() + + if origin != (0.5, 0.5, 0): + dst = np.array((0.5, 0.5, 0)) + src = np.array(origin) + self.tensor[:, :3] += self.tensor[:, 3:6] * (dst - src) + + @property + def volume(self): + """np.array: A vector with volume of each box.""" + return self.tensor[:, 3] * self.tensor[:, 4] * self.tensor[:, 5] + + @property + def dims(self): + """np.array: Size dimensions of each box in shape (N, 3).""" + return self.tensor[:, 3:6] + + @property + def yaw(self): + """np.array: A vector with yaw of each box in shape (N, ).""" + return self.tensor[:, 6] + + @property + def bottom_center(self): + """np.array: A tensor with center of each box in shape (N, 3).""" + return self.tensor[:, :3] + + @abstractmethod + def rotate(self, angle, points=None): + """Rotate boxes with points (optional) with the given angle or rotation + matrix. + + Args: + angle (float | torch.Tensor | np.ndarray): + Rotation angle or rotation matrix. + points (torch.Tensor | numpy.ndarray | + :obj:`BasePoints`, optional): + Points to rotate. Defaults to None. + """ + pass + + def translate(self, trans_vector): + """Translate boxes with the given translation vector. + + Args: + trans_vector (np.array): Translation vector of size (1, 3). + """ + if not isinstance(trans_vector, np.ndarray): + trans_vector = np.array(trans_vector) + self.tensor[:, :3] += trans_vector + + def clone(self): + """Clone the Boxes. + + Returns: + :obj:`BaseInstance3DBoxes`: Box object with the same properties + as self. + """ + original_type = type(self) + return original_type( + self.tensor.copy(), box_dim=self.box_dim, with_yaw=self.with_yaw + ) + + +class LiDARInstance3DBoxes(BaseInstance3DBoxes): + """3D boxes of instances in LIDAR coordinates. + + Coordinates in LiDAR: + + .. code-block:: none + + up z x front (yaw=0) + ^ ^ + | / + | / + (yaw=0.5*pi) left y <------ 0 + + The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), + and the yaw is around the z axis, thus the rotation axis=2. + The yaw is 0 at the positive direction of x axis, and increases from + the positive direction of x to the positive direction of y. + + A refactor is ongoing to make the three coordinate systems + easier to understand and convert between each other. + + Attributes: + tensor (np.ndarray): Float matrix of N x box_dim. + box_dim (int): Integer indicating the dimension of a box. + Each row is (x, y, z, x_size, y_size, z_size, yaw, ...). + with_yaw (bool): If True, the value of yaw will be set to 0 as minmax + boxes. + """ + + YAW_AXIS = 2 + + @property + def gravity_center(self): + """np.array: A tensor with center of each box in shape (N, 3).""" + bottom_center = self.bottom_center + gravity_center = np.zeros_like(bottom_center) + gravity_center[:, :2] = bottom_center[:, :2] + gravity_center[:, 2] = bottom_center[:, 2] + self.tensor[:, 5] * 0.5 + return gravity_center diff --git a/autonomous_driving/uniad/nuscenes_dataset.py b/autonomous_driving/uniad/nuscenes_dataset.py new file mode 100644 index 000000000..3cf580dfb --- /dev/null +++ b/autonomous_driving/uniad/nuscenes_dataset.py @@ -0,0 +1,719 @@ +import os +import pickle + +import cv2 +import numpy as np +import tqdm +from nuscenes import NuScenes +from nuscenes.can_bus.can_bus_api import NuScenesCanBus +from nuscenes.eval.common.utils import Quaternion, quaternion_yaw +from nuscenes.prediction import PredictHelper +from nuscenes.utils import splits +from torch.utils.data import Dataset + +# isort: off +from lidar_box3d import LiDARInstance3DBoxes + + +def obtain_sensor2top( + nusc, sensor_token, l2e_t, l2e_r_mat, e2g_t, e2g_r_mat, sensor_type="lidar" +): + """Obtain the info with RT matric from general sensor to Top LiDAR. + + Args: + nusc (class): Dataset class in the nuScenes dataset. + sensor_token (str): Sample data token corresponding to the + specific sensor type. + l2e_t (np.ndarray): Translation from lidar to ego in shape (1, 3). + l2e_r_mat (np.ndarray): Rotation matrix from lidar to ego + in shape (3, 3). + e2g_t (np.ndarray): Translation from ego to global in shape (1, 3). + e2g_r_mat (np.ndarray): Rotation matrix from ego to global + in shape (3, 3). + sensor_type (str): Sensor to calibrate. Default: 'lidar'. + + Returns: + sweep (dict): Sweep information after transformation. + """ + sd_rec = nusc.get("sample_data", sensor_token) + cs_record = nusc.get("calibrated_sensor", sd_rec["calibrated_sensor_token"]) + pose_record = nusc.get("ego_pose", sd_rec["ego_pose_token"]) + data_path = str(nusc.get_sample_data_path(sd_rec["token"])) + if nusc.dataroot in data_path: + data_path = data_path.split(f"{nusc.dataroot}")[-1] # relative path + sweep = { + "data_path": data_path, + "type": sensor_type, + "sample_data_token": sd_rec["token"], + "sensor2ego_translation": cs_record["translation"], + "sensor2ego_rotation": cs_record["rotation"], + "ego2global_translation": pose_record["translation"], + "ego2global_rotation": pose_record["rotation"], + "timestamp": sd_rec["timestamp"], + } + + l2e_r_s = sweep["sensor2ego_rotation"] + l2e_t_s = sweep["sensor2ego_translation"] + e2g_r_s = sweep["ego2global_rotation"] + e2g_t_s = sweep["ego2global_translation"] + + # obtain the RT from sensor to Top LiDAR + # sweep->ego->global->ego'->lidar + l2e_r_s_mat = Quaternion(l2e_r_s).rotation_matrix + e2g_r_s_mat = Quaternion(e2g_r_s).rotation_matrix + R = (l2e_r_s_mat.T @ e2g_r_s_mat.T) @ ( + np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T + ) + T = (l2e_t_s @ e2g_r_s_mat.T + e2g_t_s) @ ( + np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T + ) + T -= ( + e2g_t @ (np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T) + + l2e_t @ np.linalg.inv(l2e_r_mat).T + ) + sweep["sensor2lidar_rotation"] = R.T # points @ R.T + T + sweep["sensor2lidar_translation"] = T + return sweep + + +def get_can_bus_info(nusc, nusc_can_bus, sample): + scene_name = nusc.get("scene", sample["scene_token"])["name"] + sample_timestamp = sample["timestamp"] + try: + pose_list = nusc_can_bus.get_messages(scene_name, "pose") + except: + return np.zeros(18) # server scenes do not have can bus information. + can_bus = [] + # during each scene, the first timestamp of can_bus may be large than the first sample's timestamp + last_pose = pose_list[0] + for i, pose in enumerate(pose_list): + if pose["utime"] > sample_timestamp: + break + last_pose = pose + _ = last_pose.pop("utime") # useless + pos = last_pose.pop("pos") + rotation = last_pose.pop("orientation") + can_bus.extend(pos) + can_bus.extend(rotation) + for key in last_pose.keys(): + can_bus.extend(pose[key]) # 16 elements + can_bus.extend([0.0, 0.0]) + return np.array(can_bus) + + +def get_future_traj_info(nusc, sample, predict_steps=16): + sample_token = sample["token"] + ann_tokens = np.array(sample["anns"]) + sd_rec = nusc.get("sample", sample_token) + fut_traj_all = [] + fut_traj_valid_mask_all = [] + _, boxes, _ = nusc.get_sample_data( + sd_rec["data"]["LIDAR_TOP"], selected_anntokens=ann_tokens + ) + predict_helper = PredictHelper(nusc) + for i, ann_token in enumerate(ann_tokens): + box = boxes[i] + instance_token = nusc.get("sample_annotation", ann_token)["instance_token"] + fut_traj_local = predict_helper.get_future_for_agent( + instance_token, + sample_token, + seconds=predict_steps // 2, + in_agent_frame=True, + ) + + fut_traj = np.zeros((predict_steps, 2)) + fut_traj_valid_mask = np.zeros((predict_steps, 2)) + if fut_traj_local.shape[0] > 0: + fut_traj_scence_centric = fut_traj_local + fut_traj[: fut_traj_scence_centric.shape[0], :] = fut_traj_scence_centric + fut_traj_valid_mask[: fut_traj_scence_centric.shape[0], :] = 1 + fut_traj_all.append(fut_traj) + fut_traj_valid_mask_all.append(fut_traj_valid_mask) + if len(ann_tokens) > 0: + fut_traj_all = np.stack(fut_traj_all, axis=0) + fut_traj_valid_mask_all = np.stack(fut_traj_valid_mask_all, axis=0) + else: + fut_traj_all = np.zeros((0, predict_steps, 2)) + fut_traj_valid_mask_all = np.zeros((0, predict_steps, 2)) + return fut_traj_all, fut_traj_valid_mask_all + + +class NuScenesTraj: + def __init__( + self, + nusc, + ): + self.nusc = nusc + self.prepare_sdc_vel_info() + self.planning_steps = 6 + self.with_velocity = True + + def get_vel_transform_mats(self, sample): + sd_rec = self.nusc.get("sample_data", sample["data"]["LIDAR_TOP"]) + cs_record = self.nusc.get( + "calibrated_sensor", sd_rec["calibrated_sensor_token"] + ) + pose_record = self.nusc.get("ego_pose", sd_rec["ego_pose_token"]) + + l2e_r = cs_record["rotation"] + l2e_t = cs_record["translation"] + e2g_r = pose_record["rotation"] + e2g_t = pose_record["translation"] + l2e_r_mat = Quaternion(l2e_r).rotation_matrix + e2g_r_mat = Quaternion(e2g_r).rotation_matrix + + return l2e_r_mat, e2g_r_mat + + def get_vel_and_time(self, sample): + lidar_token = sample["data"]["LIDAR_TOP"] + lidar_top = self.nusc.get("sample_data", lidar_token) + pose = self.nusc.get("ego_pose", lidar_top["ego_pose_token"]) + xyz = pose["translation"] + timestamp = sample["timestamp"] + return xyz, timestamp + + def prepare_sdc_vel_info(self): + # generate sdc velocity info for all samples + # Note that these velocity values are converted from + # global frame to lidar frame + # as aligned with bbox gts + + self.sdc_vel_info = {} + for scene in self.nusc.scene: + scene_token = scene["token"] + + # we cannot infer vel for the last sample, therefore we skip it + last_sample_token = scene["last_sample_token"] + sample_token = scene["first_sample_token"] + sample = self.nusc.get("sample", sample_token) + xyz, time = self.get_vel_and_time(sample) + while sample["token"] != last_sample_token: + next_sample_token = sample["next"] + next_sample = self.nusc.get("sample", next_sample_token) + next_xyz, next_time = self.get_vel_and_time(next_sample) + dc = np.array(next_xyz) - np.array(xyz) + dt = (next_time - time) / 1e6 + vel = dc / dt + + # global frame to lidar frame + l2e_r_mat, e2g_r_mat = self.get_vel_transform_mats(sample) + vel = vel @ np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T + vel = vel[:2] + + self.sdc_vel_info[sample["token"]] = vel + xyz, time = next_xyz, next_time + sample = next_sample + + # set first sample's vel equal to second sample's + last_sample = self.nusc.get("sample", last_sample_token) + second_last_sample_token = last_sample["prev"] + self.sdc_vel_info[last_sample_token] = self.sdc_vel_info[ + second_last_sample_token + ] + + def generate_sdc_info(self, sdc_vel): + """Generate SDC (Self-Driving Car) bounding box info. + + Args: + sdc_vel: SDC velocity [vx, vy] + + Returns: + LiDARInstance3DBoxes: SDC bounding box + """ + + # sdc dim from https://forum.nuscenes.org/t/dimensions-of-the-ego-vehicle-used-to-gather-data/550 + # TODO(box3d): we have changed yaw to mmdet3d 1.0.0rc6 format, wlh->lwh -pi->0.5pi + psudo_sdc_bbox = np.array([0.0, 0.0, 0.0, 4.08, 1.73, 1.56, 0.5 * np.pi]) + if self.with_velocity: + psudo_sdc_bbox = np.concatenate([psudo_sdc_bbox, sdc_vel], axis=-1) + gt_bboxes_3d = np.array([psudo_sdc_bbox]).astype(np.float32) + + # the nuscenes box center is [0.5, 0.5, 0.5], we change it to be + # the same as KITTI (0.5, 0.5, 0) + gt_bboxes_3d = LiDARInstance3DBoxes( + gt_bboxes_3d, box_dim=gt_bboxes_3d.shape[-1], origin=(0.5, 0.5, 0.5) + ) + + return gt_bboxes_3d + + def get_l2g_transform(self, sample): + sd_rec = self.nusc.get("sample_data", sample["data"]["LIDAR_TOP"]) + cs_record = self.nusc.get( + "calibrated_sensor", sd_rec["calibrated_sensor_token"] + ) + pose_record = self.nusc.get("ego_pose", sd_rec["ego_pose_token"]) + + l2e_r = cs_record["rotation"] + l2e_t = np.array(cs_record["translation"]) + e2g_r = pose_record["rotation"] + e2g_t = np.array(pose_record["translation"]) + l2e_r_mat = Quaternion(l2e_r).rotation_matrix + e2g_r_mat = Quaternion(e2g_r).rotation_matrix + + return l2e_r_mat, l2e_t, e2g_r_mat, e2g_t + + def get_sdc_planning_label(self, sample_token): + sd_rec = self.nusc.get("sample", sample_token) + l2e_r_mat_init, l2e_t_init, e2g_r_mat_init, e2g_t_init = self.get_l2g_transform( + sd_rec + ) + + planning = [] + for _ in range(self.planning_steps): + next_annotation_token = sd_rec["next"] + if next_annotation_token == "": + break + sd_rec = self.nusc.get("sample", next_annotation_token) + l2e_r_mat_curr, l2e_t_curr, e2g_r_mat_curr, e2g_t_curr = ( + self.get_l2g_transform(sd_rec) + ) # (lidar to global at current frame) + + # bbox of sdc under current lidar frame + next_bbox3d = self.generate_sdc_info( + self.sdc_vel_info[next_annotation_token] + ) + + # to bbox under curr ego frame + next_bbox3d.rotate(l2e_r_mat_curr.T) + next_bbox3d.translate(l2e_t_curr) + + # to bbox under world frame + next_bbox3d.rotate(e2g_r_mat_curr.T) + next_bbox3d.translate(e2g_t_curr) + + # to bbox under initial ego frame, first inverse translate, then inverse rotate + next_bbox3d.translate(-e2g_t_init) + m1 = np.linalg.inv(e2g_r_mat_init) + next_bbox3d.rotate(m1.T) + + # to bbox under curr ego frame, first inverse translate, then inverse rotate + next_bbox3d.translate(-l2e_t_init) + m2 = np.linalg.inv(l2e_r_mat_init) + next_bbox3d.rotate(m2.T) + + planning.append(next_bbox3d) + + planning_all = np.zeros((1, self.planning_steps, 3)) + planning_mask_all = np.zeros((1, self.planning_steps, 2)) + n_valid_timestep = len(planning) + if n_valid_timestep > 0: + planning = [p.tensor.squeeze(0) for p in planning] + planning = np.stack(planning, axis=0) # (valid_t, 9) + planning = planning[:, [0, 1, 6]] # (x, y, yaw) + planning_all[:, :n_valid_timestep, :] = planning + planning_mask_all[:, :n_valid_timestep, :] = 1 + + mask = planning_mask_all[0].any(axis=1) + if mask.sum() == 0: + command = 2 #'FORWARD' + elif planning_all[0, mask][-1][0] >= 2: + command = 0 #'RIGHT' + elif planning_all[0, mask][-1][0] <= -2: + command = 1 #'LEFT' + else: + command = 2 #'FORWARD' + + return planning_all, planning_mask_all, command + + +class NuScenesDataset(Dataset): + NameMapping = { + "movable_object.barrier": "barrier", + "vehicle.bicycle": "bicycle", + "vehicle.bus.bendy": "bus", + "vehicle.bus.rigid": "bus", + "vehicle.car": "car", + "vehicle.construction": "construction_vehicle", + "vehicle.motorcycle": "motorcycle", + "human.pedestrian.adult": "pedestrian", + "human.pedestrian.child": "pedestrian", + "human.pedestrian.construction_worker": "pedestrian", + "human.pedestrian.police_officer": "pedestrian", + "movable_object.trafficcone": "traffic_cone", + "vehicle.trailer": "trailer", + "vehicle.truck": "truck", + } + + CLASSES = ( + "car", + "truck", + "construction_vehicle", + "bus", + "trailer", + "barrier", + "motorcycle", + "bicycle", + "pedestrian", + "traffic_cone", + ) + + def __init__( + self, + data_root=None, + version=None, + ann_file=None, + test_scenes=None, + ): + ### Custom3DDataset + + if data_root is None: + self.data_root = "./" + elif data_root.endswith("/"): + self.data_root = data_root + else: + self.data_root = data_root + "/" + + if ann_file is None: + self.version = version or "v1.0-trainval" + self.data_infos = self.nuscenes_data_prep( + self.version, self.data_root, test_scenes=test_scenes + ) + else: + self.data_infos = self.load_annotations(ann_file) + self.nusc = NuScenes( + version=self.version, dataroot=self.data_root, verbose=True + ) + + # filter scenes if test_scenes is provided + if test_scenes is not None: + val_scenes = set() + for s in self.nusc.scene: + if s["name"] not in test_scenes: + continue + val_scenes.add(s["token"]) + + self.data_infos = [ + s for s in self.data_infos if s["scene_token"] in val_scenes + ] + + self.use_camera = True + + ### NuScenesE2EDataset + self.traj_api = NuScenesTraj(self.nusc) + + def __len__(self): + return len(self.data_infos) + + def nuscenes_data_prep(self, version, data_root, test_scenes=None, max_sweeps=10): + self.nusc = NuScenes(version=version, dataroot=data_root, verbose=True) + nusc_can_bus = NuScenesCanBus(dataroot=data_root) + + val_scenes = set() + for s in self.nusc.scene: + if test_scenes and s["name"] not in test_scenes: + continue + val_scenes.add(s["token"]) + + samples = [x for x in self.nusc.sample if x["scene_token"] in val_scenes] + + val_nusc_infos = [] + frame_idx = 0 + for sample in tqdm.tqdm(samples, desc="Preparing data"): + lidar_token = sample["data"]["LIDAR_TOP"] + sd_rec = self.nusc.get("sample_data", sample["data"]["LIDAR_TOP"]) + cs_record = self.nusc.get( + "calibrated_sensor", sd_rec["calibrated_sensor_token"] + ) + pose_record = self.nusc.get("ego_pose", sd_rec["ego_pose_token"]) + lidar_path, boxes, _ = self.nusc.get_sample_data(lidar_token) + + can_bus = get_can_bus_info(self.nusc, nusc_can_bus, sample) + info = { + "token": sample["token"], + "can_bus": can_bus, + "sweeps": [], + "cams": dict(), + "scene_token": sample["scene_token"], # temporal related info + "lidar2ego_translation": cs_record["translation"], + "lidar2ego_rotation": cs_record["rotation"], + "ego2global_translation": pose_record["translation"], + "ego2global_rotation": pose_record["rotation"], + "timestamp": sample["timestamp"], + } + + if sample["next"] == "": + frame_idx = 0 + else: + frame_idx += 1 + + l2e_r = info["lidar2ego_rotation"] + l2e_t = info["lidar2ego_translation"] + e2g_r = info["ego2global_rotation"] + e2g_t = info["ego2global_translation"] + l2e_r_mat = Quaternion(l2e_r).rotation_matrix + e2g_r_mat = Quaternion(e2g_r).rotation_matrix + + # obtain 6 image's information per frame + camera_types = [ + "CAM_FRONT", + "CAM_FRONT_RIGHT", + "CAM_FRONT_LEFT", + "CAM_BACK", + "CAM_BACK_LEFT", + "CAM_BACK_RIGHT", + ] + for cam in camera_types: + cam_token = sample["data"][cam] + cam_path, _, cam_intrinsic = self.nusc.get_sample_data(cam_token) + cam_info = obtain_sensor2top( + self.nusc, cam_token, l2e_t, l2e_r_mat, e2g_t, e2g_r_mat, cam + ) + cam_info.update(cam_intrinsic=cam_intrinsic) + info["cams"].update({cam: cam_info}) + + # obtain sweeps for a single key-frame + sd_rec = self.nusc.get("sample_data", sample["data"]["LIDAR_TOP"]) + sweeps = [] + while len(sweeps) < max_sweeps: + if not sd_rec["prev"] == "": + sweep = obtain_sensor2top( + self.nusc, + sd_rec["prev"], + l2e_t, + l2e_r_mat, + e2g_t, + e2g_r_mat, + "lidar", + ) + sweeps.append(sweep) + sd_rec = self.nusc.get("sample_data", sd_rec["prev"]) + else: + break + info["sweeps"] = sweeps + # obtain annotation + annotations = [ + self.nusc.get("sample_annotation", token) for token in sample["anns"] + ] + locs = np.array([b.center for b in boxes]).reshape(-1, 3) + dims = np.array([b.wlh for b in boxes]).reshape(-1, 3) + rots = np.array([b.orientation.yaw_pitch_roll[0] for b in boxes]).reshape( + -1, 1 + ) + velocity = np.array( + [self.nusc.box_velocity(token)[:2] for token in sample["anns"]] + ) + valid_flag = np.array( + [ + (anno["num_lidar_pts"] + anno["num_radar_pts"]) > 0 + for anno in annotations + ], + dtype=bool, + ).reshape(-1) + instance_inds = [ + self.nusc.getind("instance", ann["instance_token"]) + for ann in annotations + ] + future_traj_all, future_traj_valid_mask_all = get_future_traj_info( + self.nusc, sample + ) + instance_tokens = [ + ann["instance_token"] for ann in annotations + ] # dtype(' 0.1 + traj_points = view_points(total_xy.T, camera_intrinsic, normalize=True)[ + :2, : + ] + traj_points = traj_points[:2, in_range_mask] + if box.is_sdc: + if render_sdc: + self.axes[i // 3, i % 3].scatter( + traj_points[0], traj_points[1], color=(1, 0.5, 0), s=150 + ) + else: + continue + else: + tr_id = tr_id_list[j] + if tr_id is None: + tr_id = 0 + c = color_mapping[tr_id % len(color_mapping)] + self.axes[i // 3, i % 3].scatter( + traj_points[0], traj_points[1], color=c, s=15 + ) + self.axes[i // 3, i % 3].set_xlim(0, imsize[0]) + self.axes[i // 3, i % 3].set_ylim(imsize[1], 0) + + def get_image_info(self, sample_data_token, nusc): + """Retrieve image information.""" + sd_record = nusc.get("sample_data", sample_data_token) + cs_record = nusc.get("calibrated_sensor", sd_record["calibrated_sensor_token"]) + sensor_record = nusc.get("sensor", cs_record["sensor_token"]) + pose_record = nusc.get("ego_pose", sd_record["ego_pose_token"]) + + data_path = nusc.get_sample_data_path(sample_data_token) + + if sensor_record["modality"] == "camera": + cam_intrinsic = np.array(cs_record["camera_intrinsic"]) + imsize = (sd_record["width"], sd_record["height"]) + else: + cam_intrinsic = None + imsize = None + return data_path, cs_record, pose_record, cam_intrinsic, imsize + + +class AgentPredictionData: + """ + Agent data class, includes bbox, traj, and occflow + """ + + def __init__( + self, + pred_score, + pred_label, + pred_center, + pred_dim, + pred_yaw, + pred_vel, + pred_traj, + pred_traj_score, + pred_track_id=None, + pred_occ_map=None, + is_sdc=False, + past_pred_traj=None, + command=None, + attn_mask=None, + ): + self.pred_score = pred_score + self.pred_label = pred_label + self.pred_center = pred_center + self.pred_dim = pred_dim + # self.pred_yaw = pred_yaw + self.pred_yaw = -pred_yaw - np.pi / 2 + self.pred_vel = pred_vel + self.pred_traj = pred_traj + self.pred_traj_score = pred_traj_score + self.pred_track_id = pred_track_id + self.pred_occ_map = pred_occ_map + if self.pred_traj is not None: + if isinstance(self.pred_traj_score, int): + self.pred_traj_max = self.pred_traj + else: + self.pred_traj_max = self.pred_traj[self.pred_traj_score.argmax()] + else: + self.pred_traj_max = None + self.nusc_box = Box( + center=pred_center, + size=pred_dim, + orientation=Quaternion(axis=[0, 0, 1], radians=self.pred_yaw), + label=pred_label, + score=pred_score, + ) + if is_sdc: + self.pred_center = [0, 0, -1.2 + 1.56 / 2] + self.is_sdc = is_sdc + self.past_pred_traj = past_pred_traj + self.command = command + self.attn_mask = attn_mask + + +class Visualizer: + def __init__( + self, bbox_results, nuscenes=None, dataroot="data/nuscenes", version="v1.0-mini" + ): + if nuscenes is None: + self.nusc = NuScenes(version=version, dataroot=dataroot, verbose=True) + else: + self.nusc = nuscenes + + self.with_planning = True + self.show_command = True + self.show_sdc_car = True + self.show_legend = True + self.with_pred_traj = True + self.with_pred_box = True + show_gt_boxes = False + + self.token_set = set() + self.predictions = self._parse_predictions(bbox_results) + self.bev_render = BEVRender(show_gt_boxes=show_gt_boxes) + self.cam_render = CameraRender(show_gt_boxes=show_gt_boxes) + + def _parse_predictions(self, bbox_results): + outputs = bbox_results + prediction_dict = dict() + for k in range(len(outputs)): + token = outputs[k]["token"] + self.token_set.add(token) + + # detection + bboxes: LiDARInstance3DBoxes = outputs[k]["boxes_3d"] + scores = outputs[k]["scores_3d"] + labels = outputs[k]["labels_3d"] + + track_scores = scores + track_labels = labels + track_boxes = bboxes.tensor + + track_centers = bboxes.gravity_center + track_dims = bboxes.dims + track_yaw = bboxes.yaw + + track_ids = outputs[k]["track_ids"] + + # speed + track_velocity = bboxes.tensor[:, -2:] + + # trajectories + trajs = outputs[k]["traj"] + traj_scores = outputs[k]["traj_scores"] + + predicted_agent_list = [] + for i in range(track_scores.shape[0]): + if track_scores[i] < 0.25: + continue + + if i < len(track_ids): + track_id = track_ids[i] + else: + track_id = 0 + + predicted_agent_list.append( + AgentPredictionData( + track_scores[i], + track_labels[i], + track_centers[i], + track_dims[i], + track_yaw[i], + track_velocity[i], + trajs[i], + traj_scores[i], + pred_track_id=track_id, + pred_occ_map=None, + past_pred_traj=None, + ) + ) + + # with planning + if self.with_planning: + # detection + bboxes: LiDARInstance3DBoxes = outputs[k]["sdc_boxes_3d"] + scores = outputs[k]["sdc_scores_3d"] + labels = 0 + + track_scores = scores + track_labels = labels + + track_centers = bboxes.gravity_center + track_dims = bboxes.dims + track_yaw = bboxes.yaw + track_velocity = bboxes.tensor[:, -2:] + + command = outputs[k]["command"] + planning_agent = AgentPredictionData( + track_scores[0], + track_labels, + track_centers[0], + track_dims[0], + track_yaw[0], + track_velocity[0], + outputs[k]["planning_traj"][0], + 1, + pred_track_id=-1, + pred_occ_map=None, + past_pred_traj=None, + is_sdc=True, + command=command, + ) + predicted_agent_list.append(planning_agent) + else: + planning_agent = None + prediction_dict[token] = dict( + predicted_agent_list=predicted_agent_list, + predicted_map_seg=None, + predicted_planning=planning_agent, + ) + + return prediction_dict + + def visualize_bev(self, sample_token): + self.bev_render.reset_canvas(dx=1, dy=1) + self.bev_render.set_plot_cfg() + + if self.with_pred_box: + self.bev_render.render_pred_box_data( + self.predictions[sample_token]["predicted_agent_list"] + ) + if self.with_pred_traj: + self.bev_render.render_pred_traj( + self.predictions[sample_token]["predicted_agent_list"] + ) + if self.with_planning: + self.bev_render.render_pred_box_data( + [self.predictions[sample_token]["predicted_planning"]] + ) + self.bev_render.render_planning_data( + self.predictions[sample_token]["predicted_planning"], + show_command=self.show_command, + ) + if self.show_sdc_car: + self.bev_render.render_sdc_car() + if self.show_legend: + self.bev_render.render_legend() + + img = self.bev_render.save_fig() + return img + + def visualize_cam(self, sample_token): + self.cam_render.reset_canvas(dx=2, dy=3, tight_layout=True) + self.cam_render.render_image_data(sample_token, self.nusc) + self.cam_render.render_pred_track_bbox( + self.predictions[sample_token]["predicted_agent_list"], + sample_token, + self.nusc, + ) + self.cam_render.render_pred_traj( + self.predictions[sample_token]["predicted_agent_list"], + sample_token, + self.nusc, + render_sdc=self.with_planning, + ) + + img = self.cam_render.save_fig() + return img diff --git a/autonomous_driving/uniad/requirements.txt b/autonomous_driving/uniad/requirements.txt new file mode 100644 index 000000000..8af343025 --- /dev/null +++ b/autonomous_driving/uniad/requirements.txt @@ -0,0 +1,4 @@ +h5py +casadi==3.6.7 +nuscenes-devkit==1.1.11 +pyquaternion==0.9.9 \ No newline at end of file diff --git a/autonomous_driving/uniad/resources/legend.png b/autonomous_driving/uniad/resources/legend.png new file mode 100644 index 000000000..c34e3b37b Binary files /dev/null and b/autonomous_driving/uniad/resources/legend.png differ diff --git a/autonomous_driving/uniad/resources/sdc_car.png b/autonomous_driving/uniad/resources/sdc_car.png new file mode 100644 index 000000000..df19cbc33 Binary files /dev/null and b/autonomous_driving/uniad/resources/sdc_car.png differ diff --git a/autonomous_driving/uniad/track_instance.py b/autonomous_driving/uniad/track_instance.py new file mode 100644 index 000000000..05134306e --- /dev/null +++ b/autonomous_driving/uniad/track_instance.py @@ -0,0 +1,108 @@ +import itertools +from typing import Any, Dict, List, Tuple, Union + +import numpy as np + + +class Instances: + def __init__(self, image_size: Tuple[int, int], **kwargs: Any): + """ + Args: + image_size (height, width): the spatial size of the image. + kwargs: fields to add to this `Instances`. + """ + self._image_size = image_size + self._fields: Dict[str, Any] = {} + for k, v in kwargs.items(): + self.set(k, v) + + @property + def image_size(self) -> Tuple[int, int]: + """ + Returns: + tuple: height, width + """ + return self._image_size + + def __setattr__(self, name: str, val: Any) -> None: + if name.startswith("_"): + super().__setattr__(name, val) + else: + self.set(name, val) + + def __getattr__(self, name: str) -> Any: + return self._fields[name] + + def set(self, name: str, value: Any) -> None: + self._fields[name] = value + + def has(self, name: str) -> bool: + return name in self._fields + + def remove(self, name: str) -> None: + del self._fields[name] + + def get(self, name: str) -> Any: + return self._fields[name] + + def __getitem__(self, item: Union[int, slice, np.ndarray]) -> "Instances": + """ + Args: + item: an index-like object and will be used to index all the fields. + Returns: + If `item` is a string, return the data in the corresponding field. + Otherwise, returns an `Instances` where all fields are indexed by `item`. + """ + if type(item) == int: + if item >= len(self) or item < -len(self): + raise IndexError("Instances index out of range!") + else: + item = slice(item, None, len(self)) + + ret = Instances(self._image_size) + for k, v in self._fields.items(): + # if index by numpy boolean array + if k == "kalman_models" and isinstance(item, np.ndarray): + ret_list = [] + for i, if_true in enumerate(item): + if if_true: + ret_list.append(self.kalman_models[i]) + ret.set(k, ret_list) + else: + ret.set(k, v[item]) + return ret + + def __len__(self) -> int: + for v in self._fields.values(): + # use __len__ because len() has to be int and is not friendly to tracing + return v.__len__() + raise NotImplementedError("Empty Instances does not support __len__!") + + @staticmethod + def cat(instance_lists: List["Instances"]) -> "Instances": + """ + Args: + instance_lists (list[Instances]) + Returns: + Instances + """ + if len(instance_lists) == 1: + return instance_lists[0] + + image_size = instance_lists[0].image_size + ret = Instances(image_size) + for k in instance_lists[0]._fields.keys(): + values = [i.get(k) for i in instance_lists] + v0 = values[0] + if isinstance(v0, np.ndarray): + values = np.concatenate(values, axis=0) + elif isinstance(v0, list): + values = list(itertools.chain(*values)) + elif hasattr(type(v0), "cat"): + values = type(v0).cat(values) + else: + raise ValueError( + "Unsupported type {} for concatenation".format(type(v0)) + ) + ret.set(k, values) + return ret diff --git a/autonomous_driving/uniad/tracker.py b/autonomous_driving/uniad/tracker.py new file mode 100644 index 000000000..92fa59c22 --- /dev/null +++ b/autonomous_driving/uniad/tracker.py @@ -0,0 +1,32 @@ +from track_instance import Instances + + +class RuntimeTrackerBase(object): + def __init__(self, score_thresh=0.5, filter_score_thresh=0.4, miss_tolerance=5): + self.score_thresh = score_thresh + self.filter_score_thresh = filter_score_thresh + self.miss_tolerance = miss_tolerance + self.max_obj_id = 0 + + def clear(self): + self.max_obj_id = 0 + + def update(self, track_instances: Instances): + track_instances.disappear_time[track_instances.scores >= self.score_thresh] = 0 + for i in range(len(track_instances)): + if ( + track_instances.obj_idxes[i] == -1 + and track_instances.scores[i] >= self.score_thresh + ): + # new track + track_instances.obj_idxes[i] = self.max_obj_id + self.max_obj_id += 1 + elif ( + track_instances.obj_idxes[i] >= 0 + and track_instances.scores[i] < self.filter_score_thresh + ): + # sleep time ++ + track_instances.disappear_time[i] += 1 + if track_instances.disappear_time[i] >= self.miss_tolerance: + # mark deaded tracklets: Set the obj_id to -1. + track_instances.obj_idxes[i] = -1 diff --git a/autonomous_driving/uniad/uniad.py b/autonomous_driving/uniad/uniad.py new file mode 100644 index 000000000..11c004186 --- /dev/null +++ b/autonomous_driving/uniad/uniad.py @@ -0,0 +1,1527 @@ +import copy +import glob +import os +import sys +import warnings +from logging import getLogger + +import ailia +import cv2 +import h5py +import numpy as np +import tqdm + +# import original modules +sys.path.append("../../util") +# isort: off +from arg_utils import get_base_parser, get_savepath, update_parser +from math_utils import sigmoid +from model_utils import check_and_download_file, check_and_download_models + +# isort: on + +# import local modules +from collision_optimization import CollisionNonlinearOptimizer +from lidar_box3d import LiDARInstance3DBoxes +from nuscenes_dataset import NuScenesDataset +from render import Visualizer +from track_instance import Instances +from tracker import RuntimeTrackerBase + +logger = getLogger(__name__) + + +# ====================== +# Parameters +# ====================== + +WEIGHT_BEV_ENC_PATH = "bev_encoder.msfix.onnx" +MODEL_BEV_ENC_PATH = "bev_encoder.msfix.onnx.prototxt" +WEIGHT_TRACK_HEAD_PATH = "track_head.onnx" +MODEL_TRACK_HEAD_PATH = "track_head.onnx.prototxt" +WEIGHT_MEMORY_BANK_PATH = "memory_bank.onnx" +MODEL_MEMORY_BANK_PATH = "memory_bank.onnx.prototxt" +WEIGHT_MEMORY_BANK_UPD_PATH = "memory_bank_update.onnx" +MODEL_MEMORY_BANK_UPD_PATH = "memory_bank_update.onnx.prototxt" +WEIGHT_QUERY_INTERACTION_PATH = "query_interact.onnx" +MODEL_QUERY_INTERACTION_PATH = "query_interact.onnx.prototxt" +WEIGHT_SEG_HEAD_PATH = "seg_head.onnx" +MODEL_SEG_HEAD_PATH = "seg_head.onnx.prototxt" +WEIGHT_MOTION_HEAD_PATH = "motion_head.onnx" +MODEL_MOTION_HEAD_PATH = "motion_head.onnx.prototxt" +WEIGHT_OCC_HEAD_PATH = "occ_head.onnx" +MODEL_OCC_HEAD_PATH = "occ_head.onnx.prototxt" +WEIGHT_PLANNING_HEAD_PATH = "planning_head.onnx" +MODEL_PLANNING_HEAD_PATH = "planning_head.onnx.prototxt" +QUERY_EMBEDDING_FILE = "resources/query_embedding.npy" +REFERENCE_POINTS_WEIGHT_FILE = "resources/reference_points_weight.npy" +REFERENCE_POINTS_BIAS_FILE = "resources/reference_points_bias.npy" +REMOTE_PATH = "https://storage.googleapis.com/ailia-models/uniad/" + +DEFAULT_SCENES = ["scene-0102", "scene-0103"] +DEFAULT_VIS_SCENES = ["scene-0103"] +SAVE_IMAGE_PATH = "output_video.avi" + + +# ====================== +# Arguemnt Parser Config +# ====================== + +parser = get_base_parser("UniAD", None, SAVE_IMAGE_PATH) +parser.add_argument( + "--scenes", + type=str, + nargs="+", + action="append", + default=None, + help="Test scenes to process. Can be specified as: --scenes scene-0102 scene-0103 OR --scenes scene-0102 --scenes scene-0103. If not specified, all scenes will be processed.", +) +parser.add_argument( + "--vis_scenes", + type=str, + nargs="+", + action="append", + default=None, + help="Scenes to visualize. Can be specified as: --vis_scenes scene-0102 scene-0103 OR --vis_scenes scene-0102 --vis_scenes scene-0103. If not specified, all processed scenes will be visualized.", +) +parser.add_argument( + "--ann_file", + type=str, + default=None, + help="Path to annotation file. If not specified, data will be prepared from scratch.", +) +parser.add_argument( + "--data_root", + type=str, + default="data/nuscenes/", + help="Path to NuScenes dataset root directory.", +) +parser.add_argument( + "--version", + type=str, + default=None, + help="NuScenes dataset version (e.g., v1.0-trainval, v1.0-mini).", +) +parser.add_argument( + "--results_file", + type=str, + default="all_frames.h5", + help="Path to HDF5 file used to stream per-frame results.", +) +parser.add_argument( + "--visualize", + type=str, + default=None, + help="Path to HDF5 file to visualize (skip inference).", +) +parser.add_argument("--onnx", action="store_true", help="execute onnxruntime version.") +args = update_parser(parser) + + +# ====================== +# Secondary Functions +# ====================== + + +def flatten_args(arg_list): + """Flatten nested lists from nargs='*' with action='append'.""" + if arg_list is None: + return None + flattened = [] + for item in arg_list: + if isinstance(item, list): + flattened.extend(item) + else: + flattened.append(item) + return flattened if flattened else None + + +def save_frame_to_hdf5(h5_group, result, frame_count): + """Save a single frame result to HDF5 group. + + Args: + h5_group: HDF5 file handle + result: Result dictionary containing frame data + frame_count: Frame index number + """ + group = h5_group.create_group(f"frame_{frame_count:06d}") + group.attrs["token"] = result["token"] + group.attrs["scene_token"] = result["scene_token"] + group.attrs["command"] = result["command"] + group.create_dataset("boxes_3d", data=result["boxes_3d"].tensor) + group.create_dataset("scores_3d", data=result["scores_3d"]) + group.create_dataset("labels_3d", data=result["labels_3d"]) + group.create_dataset("track_ids", data=result["track_ids"]) + group.create_dataset("traj", data=result["traj"]) + group.create_dataset("traj_scores", data=result["traj_scores"]) + group.create_dataset("sdc_boxes_3d", data=result["sdc_boxes_3d"].tensor) + group.create_dataset("sdc_scores_3d", data=result["sdc_scores_3d"]) + group.create_dataset("planning_traj", data=result["planning_traj"]) + # Flush to disk immediately + h5_group.flush() + + +def load_frames_from_hdf5(hdf5_file): + """Load all frame results from HDF5 file. + + Args: + hdf5_file: Path to HDF5 file + + Returns: + List of result dictionaries + """ + bbox_results = [] + with h5py.File(hdf5_file, "r") as h5f: + for frame_key in sorted(h5f.keys()): + group = h5f[frame_key] + bbox_results.append( + { + "token": group.attrs["token"], + "scene_token": group.attrs["scene_token"], + "command": group.attrs["command"], + "boxes_3d": LiDARInstance3DBoxes(group["boxes_3d"][:], box_dim=9), + "scores_3d": group["scores_3d"][:], + "labels_3d": group["labels_3d"][:], + "track_ids": group["track_ids"][:], + "traj": group["traj"][:], + "traj_scores": group["traj_scores"][:], + "sdc_boxes_3d": LiDARInstance3DBoxes( + group["sdc_boxes_3d"][:], box_dim=9 + ), + "sdc_scores_3d": group["sdc_scores_3d"][:], + "planning_traj": group["planning_traj"][:], + } + ) + + return bbox_results + + +def inverse_sigmoid(x, eps=1e-5): + """Numerically stable logit.""" + x = np.clip(x, eps, 1 - eps) + return np.log(x) - np.log(1 - x) + + +# Cache for query embedding (load once) +_query_embedding = None + + +def empty_tracks(): + global _query_embedding + + track_instances = Instances((1, 1)) + + if _query_embedding is None: + _query_embedding = np.load(QUERY_EMBEDDING_FILE) + + query = _query_embedding.copy() + num_queries, dim = query.shape + + track_instances.query = query + track_instances.ref_pts = reference_points(query) + + track_instances.obj_idxes = np.full((len(track_instances)), -1, dtype=np.int64) + track_instances.matched_gt_idxes = np.full( + (len(track_instances)), -1, dtype=np.int64 + ) + track_instances.disappear_time = np.zeros((len(track_instances)), dtype=np.int64) + + track_instances.iou = np.zeros((len(track_instances),), dtype=np.float32) + track_instances.scores = np.zeros((len(track_instances),), dtype=np.float32) + track_instances.track_scores = np.zeros((len(track_instances),), dtype=np.float32) + + # init boxes: xy, wl, z, h, sin, cos, vx, vy, vz + num_box_dims = 10 + track_instances.pred_boxes = np.zeros( + (len(track_instances), num_box_dims), dtype=np.float32 + ) + + num_classes = 10 # number of classes from the dataset config + track_instances.pred_logits = np.zeros( + (len(track_instances), num_classes), dtype=np.float32 + ) + + mem_bank_len = 4 # typical memory bank length + track_instances.mem_bank = np.zeros( + (len(track_instances), mem_bank_len, dim // 2), dtype=np.float32 + ) + track_instances.mem_padding_mask = np.ones( + (len(track_instances), mem_bank_len), dtype=bool + ) + track_instances.save_period = np.zeros((len(track_instances),), dtype=np.float32) + + return track_instances + + +def velo_update(ref_pts, velocity, l2g_r1, l2g_t1, l2g_r2, l2g_t2, time_delta): + """ + Args: + ref_pts (np.ndarray): (num_query, 3) in inverse sigmoid space. + velocity (np.ndarray): (num_query, 2) m/s in the LiDAR frame. + Outs: + np.ndarray: updated reference points in inverse sigmoid space. + """ + + num_query = ref_pts.shape[0] + velo_pad = np.concatenate( + [velocity, np.zeros((num_query, 1), dtype=velocity.dtype)], axis=-1 + ) + + reference_points = sigmoid(ref_pts) + pc_range = np.array([-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], dtype=np.float32) + reference_points[..., 0:1] = ( + reference_points[..., 0:1] * (pc_range[3] - pc_range[0]) + pc_range[0] + ) + reference_points[..., 1:2] = ( + reference_points[..., 1:2] * (pc_range[4] - pc_range[1]) + pc_range[1] + ) + reference_points[..., 2:3] = ( + reference_points[..., 2:3] * (pc_range[5] - pc_range[2]) + pc_range[2] + ) + reference_points = reference_points + velo_pad * time_delta + + ref_pts = reference_points @ l2g_r1 + l2g_t1 - l2g_t2 + g2l_r = np.linalg.inv(l2g_r2).astype(np.float32) + ref_pts = ref_pts @ g2l_r + + ref_pts[..., 0:1] = (ref_pts[..., 0:1] - pc_range[0]) / (pc_range[3] - pc_range[0]) + ref_pts[..., 1:2] = (ref_pts[..., 1:2] - pc_range[1]) / (pc_range[4] - pc_range[1]) + ref_pts[..., 2:3] = (ref_pts[..., 2:3] - pc_range[2]) / (pc_range[5] - pc_range[2]) + + ref_pts = inverse_sigmoid(ref_pts) + + return ref_pts + + +def denormalize_bbox(normalized_bboxes): + # rotation + rot_sine = normalized_bboxes[..., 6:7] + + rot_cosine = normalized_bboxes[..., 7:8] + rot = np.arctan2(rot_sine, rot_cosine) + + # center in the bev + cx = normalized_bboxes[..., 0:1] + cy = normalized_bboxes[..., 1:2] + cz = normalized_bboxes[..., 4:5] + + # size + w = normalized_bboxes[..., 2:3] + l = normalized_bboxes[..., 3:4] + h = normalized_bboxes[..., 5:6] + + w = np.exp(w) + l = np.exp(l) + h = np.exp(h) + + # velocity + vx = normalized_bboxes[:, 8:9] + vy = normalized_bboxes[:, 9:10] + denormalized_bboxes = np.concatenate([cx, cy, cz, w, l, h, rot, vx, vy], axis=-1) + + return denormalized_bboxes + + +def decode_single(cls_scores, bbox_preds, track_scores, obj_idxes, with_mask=True): + """Decode bboxes. + Args: + cls_scores (Tensor): Outputs from the classification head, \ + shape [num_query, cls_out_channels]. Note \ + cls_out_channels should includes background. + bbox_preds (Tensor): Outputs from the regression \ + head with normalized coordinate format (cx, cy, w, l, cz, h, rot_sine, rot_cosine, vx, vy). \ + Shape [num_query, 9]. + + Returns: + list[dict]: Decoded boxes. + """ + max_num = 300 + max_num = min(cls_scores.shape[0], max_num) + + num_classes = 10 + cls_scores = sigmoid(cls_scores) + indexs = np.argmax(cls_scores, axis=-1) + labels = indexs % num_classes + + bbox_index = np.argpartition(track_scores, -max_num)[-max_num:] + bbox_index = bbox_index[np.argsort(track_scores[bbox_index])[::-1]] + + labels = labels[bbox_index] + bbox_preds = bbox_preds[bbox_index] + track_scores = track_scores[bbox_index] + obj_idxes = obj_idxes[bbox_index] + + scores = track_scores + + final_box_preds = denormalize_bbox(bbox_preds) + final_scores = track_scores + final_preds = labels + + post_center_range = np.array( + [-61.2000, -61.2000, -10.0000, 61.2000, 61.2000, 10.0000] + ) + mask = (final_box_preds[..., :3] >= post_center_range[:3]).all(1) + mask &= (final_box_preds[..., :3] <= post_center_range[3:]).all(1) + if not with_mask: + mask = np.ones_like(mask) > 0 + + boxes3d = final_box_preds[mask] + scores = final_scores[mask] + labels = final_preds[mask] + track_scores = track_scores[mask] + obj_idxes = obj_idxes[mask] + predictions_dict = { + "bboxes": boxes3d, + "scores": scores, + "labels": labels, + "track_scores": track_scores, + "obj_idxes": obj_idxes, + "bbox_index": bbox_index, + "mask": mask, + } + + return predictions_dict + + +def track_instances2results(track_instances, with_mask=True): + all_cls_scores = track_instances.pred_logits + all_bbox_preds = track_instances.pred_boxes + track_scores = track_instances.scores + obj_idxes = track_instances.obj_idxes + bboxes_dict = decode_single( + all_cls_scores, + all_bbox_preds, + track_scores, + obj_idxes, + with_mask, + ) + + bboxes = bboxes_dict["bboxes"] + bboxes = LiDARInstance3DBoxes(bboxes, box_dim=9) + labels = bboxes_dict["labels"] + scores = bboxes_dict["scores"] + bbox_index = bboxes_dict["bbox_index"] + track_scores = bboxes_dict["track_scores"] + obj_idxes = bboxes_dict["obj_idxes"] + result_dict = dict( + boxes_3d=bboxes, + scores_3d=scores, + labels_3d=labels, + track_scores=track_scores, + bbox_index=bbox_index, + track_ids=obj_idxes, + mask=bboxes_dict["mask"], + track_bbox_results=[ + [ + bboxes.clone(), + scores.copy(), + labels.copy(), + bbox_index.copy(), + bboxes_dict["mask"].copy(), + ] + ], + ) + return result_dict + + +def det_instances2results(instances, result_dict): + """ + Outs: + active_instances. keys: + - 'pred_logits': + - 'pred_boxes': normalized bboxes + - 'scores' + - 'obj_idxes' + out_dict. keys: + - boxes_3d (torch.Tensor): 3D boxes. + - scores (torch.Tensor): Prediction scores. + - labels_3d (torch.Tensor): Box labels. + - attrs_3d (torch.Tensor, optional): Box attributes. + - track_ids + - tracking_score + """ + # filter out sleep querys + if instances.pred_logits.size == 0: + return [None] + + # decode + with_mask = True + all_cls_scores = instances.pred_logits + all_bbox_preds = instances.pred_boxes + track_scores = instances.scores + obj_idxes = instances.obj_idxes + bboxes_dict = decode_single( + all_cls_scores, + all_bbox_preds, + track_scores, + obj_idxes, + with_mask, + ) + + bboxes = bboxes_dict["bboxes"] + bboxes = LiDARInstance3DBoxes(bboxes, box_dim=9) + labels = bboxes_dict["labels"] + scores = bboxes_dict["scores"] + + track_scores = bboxes_dict["track_scores"] + obj_idxes = bboxes_dict["obj_idxes"] + result_dict.update( + dict( + boxes_3d_det=bboxes, + scores_3d_det=scores, + labels_3d_det=labels, + ) + ) + + return result_dict + + +def get_trajs(preds_dicts, bbox_results): + """ + Generates trajectories from the prediction results, bounding box results. + """ + num_samples = len(bbox_results) + num_layers = preds_dicts["all_traj_preds"].shape[0] + ret_list = [] + for i in range(num_samples): + preds = dict() + for j in range(num_layers): + subfix = "_" + str(j) if j < (num_layers - 1) else "" + traj = preds_dicts["all_traj_preds"][j, i] + traj_scores = preds_dicts["all_traj_scores"][j, i] + + preds["traj" + subfix] = traj + preds["traj_scores" + subfix] = traj_scores + ret_list.append(preds) + return ret_list + + +def to_video(folder_path, out_path, fps=4, downsample=1): + imgs_path = glob.glob(os.path.join(folder_path, "*.jpg")) + imgs_path = sorted(imgs_path) + img_array = [] + for img_path in imgs_path: + img = cv2.imread(img_path) + height, width, channel = img.shape + img = cv2.resize( + img, + (width // downsample, height // downsample), + interpolation=cv2.INTER_AREA, + ) + height, width, channel = img.shape + size = (width, height) + img_array.append(img) + out = cv2.VideoWriter(out_path, cv2.VideoWriter_fourcc(*"DIVX"), fps, size) + for i in range(len(img_array)): + out.write(img_array[i]) + out.release() + + +def visualize_results( + results_file, nusc=None, output_dir="vis_output", fps=4, downsample=2 +): + """Visualize detection and planning results. + + Args: + results_file: Path to HDF5 file containing frame results + nusc: NuScenes dataset instance (optional, will be loaded if None) + output_dir: Directory to save output images + fps: Frames per second for video + downsample: Downsample factor for video resolution + """ + # Determine visualization scenes internally + defaults_on_none = args.ann_file is None and args.version is None + vis_scenes = flatten_args(args.vis_scenes) or ( + DEFAULT_VIS_SCENES if defaults_on_none else None + ) + + # Load frame results from HDF5 for visualization + logger.info("Loading frame results for visualization...") + bbox_results = load_frames_from_hdf5(results_file) + + if len(bbox_results) == 0: + logger.info("No results to visualize.") + sys.exit(1) + + # Load NuScenes dataset if not provided + if nusc is None: + from nuscenes import NuScenes + + logger.info("Loading NuScenes dataset...") + data_root = args.data_root + version = args.version or "v1.0-trainval" + nusc = NuScenes(version=version, dataroot=data_root, verbose=True) + + vis = Visualizer(bbox_results=bbox_results, nuscenes=nusc) + + if vis_scenes: + logger.info(f"Visualizing scenes: {vis_scenes}") + else: + logger.info("Visualizing all processed scenes") + + scene_token_to_name = dict() + for i in range(len(vis.nusc.scene)): + scene_token_to_name[vis.nusc.scene[i]["token"]] = vis.nusc.scene[i]["name"] + + # Clear and create output directory + for f in glob.glob(os.path.join(output_dir, "*")): + if os.path.isfile(f): + os.remove(f) + os.makedirs(output_dir, exist_ok=True) + + for i, sample in enumerate(bbox_results): + sample_token = sample["token"] + scene_token = sample["scene_token"] + + # Get scene name for filtering + scene_name = scene_token_to_name[scene_token] + # Skip if not in vis_scenes list + if vis_scenes and scene_name not in vis_scenes: + continue + + with warnings.catch_warnings(): + warnings.filterwarnings("ignore", category=RuntimeWarning) + bev_img = vis.visualize_bev(sample_token) + cam_img = vis.visualize_cam(sample_token) + out_img = cv2.hconcat([cam_img, bev_img]) + out_img = out_img[:, :, ::-1] # RGB to BGR + + savepath = f"{output_dir}/{str(i).zfill(3)}.jpg" + cv2.imwrite(savepath, out_img) + logger.info(f"saved at : {savepath}") + + savepath = get_savepath(args.savepath, output_dir, ext=".avi") + jpg_paths = glob.glob(os.path.join(output_dir, "*.jpg")) + if not jpg_paths: + logger.info("No frames written; skipping video output.") + return + + to_video(output_dir, savepath, fps=fps, downsample=downsample) + logger.info(f"saved at : {savepath}") + + +# ====================== +# Main functions +# ====================== + + +def predict(models, img, img_metas, prev_bev=None, track_instances=None): + can_bus = img_metas["can_bus"].astype(np.float32) + lidar2img = img_metas["lidar2img"].astype(np.float32) + img_shape = img_metas["img_shape"] + if prev_bev is None: + prev_bev = np.zeros((40000, 1, 256), dtype=np.float32) + + # feedforward + bev_encoder = models["bev_encoder"] + if not args.onnx: + output = bev_encoder.predict([img, can_bus, lidar2img, img_shape, prev_bev]) + else: + output = bev_encoder.run( + None, + { + "img": img, + "can_bus": can_bus, + "lidar2img": lidar2img, + "img_shape": img_shape, + "prev_bev": prev_bev, + }, + ) + bev_embed, bev_pos = output + + query = track_instances.query + ref_pts = track_instances.ref_pts + + # feedforward + track_head = models["track_head"] + if not args.onnx: + output = track_head.predict([bev_embed, query, ref_pts]) + else: + output = track_head.run( + None, + { + "bev_embed": bev_embed, + "query": query, + "ref_pts": ref_pts, + }, + ) + ( + output_classes, + output_coords, + last_ref_pts, + query_feats, + all_past_traj_preds, + ) = output + + out = { + "pred_logits": output_classes, + "pred_boxes": output_coords, + "ref_pts": last_ref_pts, + "bev_embed": bev_embed, + "query_embeddings": query_feats, + "all_past_traj_preds": all_past_traj_preds, + "bev_pos": bev_pos, + } + + return out + + +def memory_bank(models, track_instances): + key_padding_mask = track_instances.mem_padding_mask # [n_, memory_bank_len] + valid_idxes = key_padding_mask[:, -1] == 0 + embed = track_instances.output_embedding[valid_idxes] # (n, 256) + + if 0 < len(embed): + net = models["memory_bank"] + if not args.onnx: + output = net.predict( + [ + track_instances.mem_padding_mask, + track_instances.output_embedding, + track_instances.mem_bank, + ] + ) + else: + output = net.run( + None, + { + "mem_padding_mask": track_instances.mem_padding_mask, + "output_embedding": track_instances.output_embedding, + "mem_bank": track_instances.mem_bank, + }, + ) + track_instances.output_embedding = output[0] + + embed = track_instances.output_embedding[:, None] # ( N, 1, 256) + save_period = track_instances.save_period + saved_idxes = (save_period == 0) & (track_instances.scores > 0) + saved_embed = embed[saved_idxes] + + if 0 < len(saved_embed): + net = models["memory_bank_update"] + if not args.onnx: + output = net.predict( + [ + track_instances.output_embedding, + track_instances.scores, + track_instances.mem_padding_mask, + track_instances.save_period, + track_instances.mem_bank, + ] + ) + else: + output = net.run( + None, + { + "output_embedding": track_instances.output_embedding, + "scores": track_instances.scores, + "mem_padding_mask": track_instances.mem_padding_mask, + "save_period": track_instances.save_period, + "mem_bank": track_instances.mem_bank, + }, + ) + mem_padding_mask, mem_bank, save_period = output + track_instances.mem_padding_mask = mem_padding_mask + track_instances.mem_bank = mem_bank + track_instances.save_period = save_period + + return track_instances + + +def query_interact(models, track_instances): + active_track_instances = track_instances[track_instances.obj_idxes >= 0] + + if 0 < len(active_track_instances): + net = models["query_interact"] + if not args.onnx: + output = net.predict( + [active_track_instances.query, active_track_instances.output_embedding] + ) + else: + output = net.run( + None, + { + "query": active_track_instances.query, + "output_embedding": active_track_instances.output_embedding, + }, + ) + updated_query = output[0] + active_track_instances.query = updated_query + + merged_track_instances = Instances.cat([empty_tracks(), active_track_instances]) + + return merged_track_instances + + +def seg_head_forward(models, pts_feats): + net = models["seg_head"] + if not args.onnx: + output = net.predict([pts_feats]) + else: + output = net.run( + None, + {"bev_embed": pts_feats}, + ) + (memory, memory_mask, memory_pos, query, query_pos, hw_lvl, hw_lvl2) = output + + out = [ + memory, + memory_mask, + memory_pos, + query, + query_pos, + [hw_lvl, hw_lvl2], + ] + return out + + +def motion_head_forward(models, bev_embed, outs_track: dict, outs_seg: dict): + track_query = outs_track["track_query_embeddings"][None, None, ...] + track_boxes: LiDARInstance3DBoxes = outs_track["track_bbox_results"] + + track_query = np.concatenate( + [track_query, outs_track["sdc_embedding"][None, None, None, :]], axis=2 + ) + sdc_track_boxes = outs_track["sdc_track_bbox_results"] + + track_boxes[0][0].tensor = np.concatenate( + [track_boxes[0][0].tensor, sdc_track_boxes[0][0].tensor], axis=0 + ) + track_boxes[0][1] = np.concatenate( + [track_boxes[0][1], sdc_track_boxes[0][1]], axis=0 + ) + track_boxes[0][2] = np.concatenate( + [track_boxes[0][2], sdc_track_boxes[0][2]], axis=0 + ) + track_boxes[0][3] = np.concatenate( + [track_boxes[0][3], sdc_track_boxes[0][3]], axis=0 + ) + memory, memory_mask, memory_pos, lane_query, lane_query_pos, hw_lvl = outs_seg + + net = models["motion_head"] + if not args.onnx: + output = net.predict( + [ + bev_embed, + track_query, + lane_query, + lane_query_pos, + track_boxes[0][0].tensor, + track_boxes[0][2], + ] + ) + else: + output = net.run( + None, + { + "bev_embed": bev_embed, + "track_query": track_query, + "lane_query": lane_query, + "lane_query_pos": lane_query_pos, + "bboxes": track_boxes[0][0].tensor, + "bbox_labels": track_boxes[0][2], + }, + ) + ( + outputs_traj_scores, + outputs_trajs, + valid_traj_masks, + inter_states, + track_query, + track_query_pos, + ) = output + + outs_motion = { + "all_traj_scores": outputs_traj_scores, + "all_traj_preds": outputs_trajs, + "valid_traj_masks": valid_traj_masks, + "traj_query": inter_states, + "track_query": track_query, + "track_query_pos": track_query_pos, + } + + # get_trajs + traj_results = get_trajs(outs_motion, track_boxes) + bboxes, scores, labels, bbox_index, mask = track_boxes[0] + outs_motion["track_scores"] = scores[None, :] + labels[-1] = 0 + + def filter_vehicle_query(outs_motion, labels, vehicle_id_list): + if len(labels) < 1: # No other obj query except sdc query. + return None + + # select vehicle query according to vehicle_id_list + vehicle_mask = np.zeros_like(labels) + for veh_id in vehicle_id_list: + vehicle_mask |= labels == veh_id + outs_motion["traj_query"] = outs_motion["traj_query"][:, :, vehicle_mask > 0] + outs_motion["track_query"] = outs_motion["track_query"][:, vehicle_mask > 0] + outs_motion["track_query_pos"] = outs_motion["track_query_pos"][ + :, vehicle_mask > 0 + ] + outs_motion["track_scores"] = outs_motion["track_scores"][:, vehicle_mask > 0] + return outs_motion + + # NuScenes データセットにおける車両(移動体)カテゴリのクラスID一覧 + # モーション予測の対象となる動的オブジェクトのみを選択 + # 0: car (乗用車), 1: truck (トラック), 2: construction_vehicle (建設車両), + # 3: bus (バス), 4: trailer (トレーラー), 6: motorcycle (バイク), 7: bicycle (自転車) + # 除外: 5: barrier (バリア), 8: pedestrian (歩行者), 9: traffic_cone (三角コーン) + vehicle_id_list = [ + 0, # car + 1, # truck + 2, # construction_vehicle + 3, # bus + 4, # trailer + 6, # motorcycle + 7, # bicycle + ] + outs_motion = filter_vehicle_query(outs_motion, labels, vehicle_id_list) + + # filter sdc query + outs_motion["sdc_traj_query"] = outs_motion["traj_query"][:, :, -1] + outs_motion["sdc_track_query"] = outs_motion["track_query"][:, -1] + outs_motion["sdc_track_query_pos"] = outs_motion["track_query_pos"][:, -1] + outs_motion["traj_query"] = outs_motion["traj_query"][:, :, :-1] + outs_motion["track_query"] = outs_motion["track_query"][:, :-1] + outs_motion["track_query_pos"] = outs_motion["track_query_pos"][:, :-1] + outs_motion["track_scores"] = outs_motion["track_scores"][:, :-1] + + return traj_results, outs_motion + + +def occ_head_forward(models, bev_feat, outs_dict, no_query=False): + if no_query: + b, t, h, w = 1, 5, 200, 200 + out_dict = dict() + out_dict["seg_out"] = np.zeros( + (b, t, 1, h, w), dtype=np.int64 + ) # [1, 5, 1, 200, 200] + out_dict["ins_seg_out"] = np.zeros( + (b, t, h, w), dtype=np.int64 + ) # [1, 5, 200, 200] + return out_dict + + traj_query = outs_dict["traj_query"] + track_query = outs_dict["track_query"] + track_query_pos = outs_dict["track_query_pos"] + + net = models["occ_head"] + if not args.onnx: + output = net.predict([bev_feat, traj_query, track_query, track_query_pos]) + else: + output = net.run( + None, + { + "bev_feat": bev_feat, + "traj_query": traj_query, + "track_query": track_query, + "track_query_pos": track_query_pos, + }, + ) + _, pred_ins_logits = output + + out_dict = dict() + out_dict["pred_ins_logits"] = pred_ins_logits + + n_future = 4 + pred_ins_logits = pred_ins_logits[:, :, : 1 + n_future] # [b, q, t, h, w] + pred_ins_sigmoid = sigmoid(pred_ins_logits) # [b, q, t, h, w] + + # with_track_score + track_scores = outs_dict["track_scores"] # [b, q] + track_scores = track_scores[:, :, None, None, None] + pred_ins_sigmoid = pred_ins_sigmoid * track_scores # [b, q, t, h, w] + + out_dict["pred_ins_sigmoid"] = pred_ins_sigmoid + pred_seg_scores = np.max(pred_ins_sigmoid, axis=1) + test_seg_thresh = 0.1 + seg_out = (pred_seg_scores > test_seg_thresh).astype(np.int64) + seg_out = np.expand_dims(seg_out, axis=2) # [b, t, 1, h, w] + out_dict["seg_out"] = seg_out + + def update_instance_ids(instance_seg, old_ids, new_ids): + indices = np.arange(int(old_ids.max()) + 1) + for old_id, new_id in zip(old_ids, new_ids): + indices[int(old_id)] = int(new_id) + + return indices[instance_seg].astype(np.int64) + + def make_instance_seg_consecutive(instance_seg): + # Make the indices of instance_seg consecutive + unique_ids = np.unique(instance_seg) # include background + new_ids = np.arange(len(unique_ids)) + instance_seg = update_instance_ids(instance_seg, unique_ids, new_ids) + return instance_seg + + def predict_instance_segmentation_and_trajectories( + foreground_masks, + ins_sigmoid, + vehicles_id=1, + ): + if foreground_masks.ndim == 5 and foreground_masks.shape[2] == 1: + foreground_masks = np.squeeze(foreground_masks, axis=2) # [b, t, h, w] + foreground_masks = ( + foreground_masks == vehicles_id + ) # [b, t, h, w] Only these places have foreground id + + argmax_ins = np.argmax( + ins_sigmoid, axis=1 + ) # long, [b, t, h, w], ins_id starts from 0 + argmax_ins = argmax_ins + 1 # [b, t, h, w], ins_id starts from 1 + instance_seg = (argmax_ins * foreground_masks.astype(np.float32)).astype( + np.int64 + ) # bg is 0, fg starts with 1 + + # Make the indices of instance_seg consecutive + instance_seg = make_instance_seg_consecutive(instance_seg).astype(np.int64) + + return instance_seg + + # ins_pred + pred_consistent_instance_seg = predict_instance_segmentation_and_trajectories( + seg_out, pred_ins_sigmoid + ) # bg is 0, fg starts with 1, consecutive + out_dict["ins_seg_out"] = pred_consistent_instance_seg # [1, 5, 200, 200] + + return out_dict + + +def planning_head_forward(models, bev_embed, outs_motion, outs_occflow, command): + sdc_traj_query = outs_motion["sdc_traj_query"] + sdc_track_query = outs_motion["sdc_track_query"] + bev_pos = outs_motion["bev_pos"] + occ_mask = outs_occflow["seg_out"] + command = np.array([command], dtype=np.int64) + + net = models["planning_head"] + if not args.onnx: + output = net.predict( + [bev_embed, bev_pos, sdc_traj_query, sdc_track_query, command] + ) + else: + output = net.run( + None, + { + "bev_embed": bev_embed, + "bev_pos": bev_pos, + "sdc_traj_query": sdc_traj_query, + "sdc_track_query": sdc_track_query, + "command": command, + }, + ) + sdc_traj, sdc_traj_all = output + + def collision_optimization(sdc_traj_all, occ_mask): + """ + Optimize SDC trajectory with occupancy instance mask. + """ + pos_xy_t = [] + valid_occupancy_num = 0 + + if occ_mask.shape[2] == 1: + occ_mask = np.squeeze(occ_mask, axis=2) + occ_horizon = occ_mask.shape[1] + assert occ_horizon == 5 + + bev_h = bev_w = 200 + occ_filter_range = 5.0 + planning_steps = 6 + sigma = 1.0 + alpha_collision = 5.0 + for t in range(planning_steps): + cur_t = min(t + 1, occ_horizon - 1) + pos_xy = np.argwhere(occ_mask[0][cur_t]) + pos_xy = pos_xy[:, [1, 0]] + pos_xy[:, 0] = (pos_xy[:, 0] - bev_h // 2) * 0.5 + 0.25 + pos_xy[:, 1] = (pos_xy[:, 1] - bev_w // 2) * 0.5 + 0.25 + + # filter the occupancy in range + keep_index = ( + np.sum((sdc_traj_all[0, t, :2][None, :] - pos_xy[:, :2]) ** 2, axis=-1) + < occ_filter_range**2 + ) + pos_xy_t.append(pos_xy[keep_index]) + valid_occupancy_num += np.sum(keep_index > 0) + if valid_occupancy_num == 0: + return sdc_traj_all + + col_optimizer = CollisionNonlinearOptimizer( + planning_steps, 0.5, sigma, alpha_collision, pos_xy_t + ) + col_optimizer.set_reference_trajectory(sdc_traj_all[0]) + sol = col_optimizer.solve() + sdc_traj_optim = np.stack( + [sol.value(col_optimizer.position_x), sol.value(col_optimizer.position_y)], + axis=-1, + ) + return sdc_traj_optim[None].astype(sdc_traj_all.dtype) + + sdc_traj_all = collision_optimization(sdc_traj_all, occ_mask) + + result_planning = { + "sdc_traj": sdc_traj_all, + "sdc_traj_all": sdc_traj_all, + } + return result_planning + + +# Cache for reference points weights and bias (loaded once) +_reference_points_weight = None +_reference_points_bias = None + + +def reference_points(query): + global _reference_points_weight, _reference_points_bias + + if _reference_points_weight is None: + _reference_points_weight = np.load(REFERENCE_POINTS_WEIGHT_FILE) + _reference_points_bias = np.load(REFERENCE_POINTS_BIAS_FILE) + + num_queries, dim = query.shape + ref_pts = ( + query[..., : dim // 2] @ _reference_points_weight.T + _reference_points_bias + ) + return ref_pts + + +def forward( + models, + track_base: RuntimeTrackerBase, + img, + img_metas, + track_instances, + prev_bev=None, + l2g_r1=None, + l2g_t1=None, + l2g_r2=None, + l2g_t2=None, + time_delta=None, +): + active_inst = track_instances[track_instances.obj_idxes >= 0] + other_inst = track_instances[track_instances.obj_idxes < 0] + + if l2g_r2 is not None and len(active_inst) > 0 and l2g_r1 is not None: + ref_pts = active_inst.ref_pts + velo = active_inst.pred_boxes[:, -2:] + ref_pts = velo_update( + ref_pts, velo, l2g_r1, l2g_t1, l2g_r2, l2g_t2, time_delta=time_delta + ) + ref_pts = np.squeeze(ref_pts, axis=0) + active_inst.ref_pts = reference_points(active_inst.query) + active_inst.ref_pts[..., :2] = ref_pts[..., :2] + + track_instances = Instances.cat([other_inst, active_inst]) + + output = predict( + models, img, img_metas, prev_bev=prev_bev, track_instances=track_instances + ) + bev_embed = output["bev_embed"] + output_classes = output["pred_logits"] + output_coords = output["pred_boxes"] + last_ref_pts = output["ref_pts"] + query_feats = output["query_embeddings"] + all_past_traj_preds = output["all_past_traj_preds"] + bev_pos = output["bev_pos"] + + item = { + "pred_logits": output_classes, + "pred_boxes": output_coords, + "ref_pts": last_ref_pts, + "bev_embed": bev_embed, + "query_embeddings": query_feats, + "all_past_traj_preds": all_past_traj_preds, + "bev_pos": bev_pos, + } + + # Apply sigmoid and get max values along last axis + logits = output_classes[-1, 0, :] + sigmoid_logits = sigmoid(logits) + track_scores = np.max(sigmoid_logits, axis=-1) + + # each track will be assigned an unique global id by the track base. + track_instances.scores = track_scores + # track_instances.track_scores = track_scores # [300] + track_instances.pred_logits = output_classes[-1, 0] # [300, num_cls] + track_instances.pred_boxes = output_coords[-1, 0] # [300, box_dim] + track_instances.output_embedding = query_feats[-1][0] # [300, feat_dim] + track_instances.ref_pts = last_ref_pts[0] + # hard_code: assume the 901 query is sdc query + track_instances.obj_idxes[900] = -2 + + """ update track base """ + track_base.update(track_instances) + + filter_score_thresh = 0.35 + active_index = (track_instances.obj_idxes >= 0) & ( + track_instances.scores >= filter_score_thresh + ) # filter out sleep objects + # select_active_track_query + result_dict = track_instances2results(track_instances[active_index], with_mask=True) + result_dict["track_query_embeddings"] = track_instances.output_embedding[ + active_index + ][result_dict["bbox_index"]][result_dict["mask"]] + result_dict["track_query_matched_idxes"] = track_instances.matched_gt_idxes[ + active_index + ][result_dict["bbox_index"]][result_dict["mask"]] + item.update(result_dict) + # select_sdc_track_query + sdc_instance = track_instances[track_instances.obj_idxes == -2] + result_dict = track_instances2results(sdc_instance, with_mask=False) + item.update( + dict( + sdc_boxes_3d=result_dict["boxes_3d"], + sdc_scores_3d=result_dict["scores_3d"], + sdc_track_scores=result_dict["track_scores"], + sdc_track_bbox_results=result_dict["track_bbox_results"], + sdc_embedding=sdc_instance.output_embedding[0], + ) + ) + + """ update with memory_bank """ + track_instances = memory_bank(models, track_instances) + + """ Update track instances using matcher """ + item["track_instances_fordet"] = track_instances + item["track_instances"] = track_instances = query_interact(models, track_instances) + + return item + + +def process_all_frames(models, data_loader, track_base, h5f): + """Process all frames from data_loader and stream results to HDF5. + + Args: + models: Dict of initialized model sessions + data_loader: Iterable yielding dataset items + track_base: Tracker base managing IDs and states + h5f: Open HDF5 file handle to write per-frame results + """ + + track_instances = None + tiemestamp = None + scene_token = None + prev_bev = None + prev_frame_info = { + "prev_bev": None, + "scene_token": None, + "prev_pos": 0, + "prev_angle": 0, + } + + for i, data in tqdm.tqdm( + enumerate(data_loader), total=len(data_loader), desc="Processing frames" + ): + data = data[0] + img = data["img"] + img_metas = data["img_metas"] + l2g_t = data["l2g_t"] + l2g_r_mat = data["l2g_r_mat"] + _timestamp = data["timestamp"] + + img = np.stack(img).transpose(0, 3, 1, 2) # N, C, H, W + img = np.expand_dims(img, axis=0) # B, N, C, H, W + + if img_metas["scene_token"] != prev_frame_info["scene_token"]: + # the first sample of each scene is truncated + prev_frame_info["prev_bev"] = None + # update idx + prev_frame_info["scene_token"] = img_metas["scene_token"] + + # Get the delta of ego position and angle between two timestamps. + tmp_pos = copy.deepcopy(img_metas["can_bus"][:3]) + tmp_angle = copy.deepcopy(img_metas["can_bus"][-1]) + # first frame + if prev_frame_info["scene_token"] is None: + img_metas["can_bus"][:3] = 0 + img_metas["can_bus"][-1] = 0 + # following frames + else: + img_metas["can_bus"][:3] -= prev_frame_info["prev_pos"] + img_metas["can_bus"][-1] -= prev_frame_info["prev_angle"] + prev_frame_info["prev_pos"] = tmp_pos + prev_frame_info["prev_angle"] = tmp_angle + + ### simple_test_track logic ### + + if track_instances is None or img_metas["scene_token"] != scene_token: + scene_token = img_metas["scene_token"] + prev_bev = None + track_instances = empty_tracks() + time_delta, l2g_r1, l2g_t1, l2g_r2, l2g_t2 = None, None, None, None, None + else: + time_delta = _timestamp - tiemestamp + l2g_r1 = l2g_r2 + l2g_t1 = l2g_t2 + l2g_r2 = l2g_r_mat[None, ...] + l2g_t2 = l2g_t[None, ...] + + frame_res = forward( + models, + track_base, + img, + img_metas, + track_instances, + prev_bev, + l2g_r1, + l2g_t1, + l2g_r2, + l2g_t2, + time_delta, + ) + + """ get time_delta and l2g r/t infos """ + """ update frame info for next frame""" + tiemestamp = _timestamp + l2g_r2 = l2g_r_mat[None, ...] + l2g_t2 = l2g_t[None, ...] + + bev_embed = prev_bev = frame_res["bev_embed"] + track_instances = frame_res["track_instances"] + track_instances_fordet = frame_res["track_instances_fordet"] + + result_dict = dict( + bev_embed=bev_embed, + bev_pos=frame_res["bev_pos"], + track_query_embeddings=frame_res["track_query_embeddings"], + track_bbox_results=frame_res["track_bbox_results"], + boxes_3d=frame_res["boxes_3d"], + scores_3d=frame_res["scores_3d"], + labels_3d=frame_res["labels_3d"], + track_scores=frame_res["track_scores"], + track_ids=frame_res["track_ids"], + sdc_boxes_3d=frame_res["sdc_boxes_3d"], + sdc_scores_3d=frame_res["sdc_scores_3d"], + sdc_track_scores=frame_res["sdc_track_scores"], + sdc_track_bbox_results=frame_res["sdc_track_bbox_results"], + sdc_embedding=frame_res["sdc_embedding"], + ) + + result_track = det_instances2results(track_instances_fordet, result_dict) + + ### End of simple_test_track ### + + # seg_head + result_seg = seg_head_forward(models, bev_embed) + # motion_head + result_motion, outs_motion = motion_head_forward( + models, bev_embed, outs_track=result_track, outs_seg=result_seg + ) + outs_motion["bev_pos"] = result_track["bev_pos"] + + result = dict() + result["token"] = img_metas["sample_idx"] + result["scene_token"] = img_metas["scene_token"] + + occ_no_query = outs_motion["track_query"].shape[1] == 0 + outs_occ = occ_head_forward( + models, + bev_embed, + outs_motion, + no_query=occ_no_query, + ) + result["occ"] = outs_occ + + command = data["command"] + planning_gt = dict(command=command) + result_planning = planning_head_forward( + models, bev_embed, outs_motion, outs_occ, command + ) + result["planning"] = dict( + planning_gt=planning_gt, + result_planning=result_planning, + ) + result.update(result_track) + result.update( + result_motion[0] + ) # 'traj_0', 'traj_scores_0', 'traj_1', 'traj_scores_1', 'traj', 'traj_scores' + + ### End of forward_test ### + + result["planning_traj"] = result["planning"]["result_planning"]["sdc_traj"] + result["command"] = result["planning"]["planning_gt"]["command"] + + # Stream frame results to HDF5 file + save_frame_to_hdf5(h5f, result, i) + + # Clear result from memory + del result + + +def recognize_from_image(models): + defaults_on_none = args.ann_file is None and args.version is None + + ann_file = args.ann_file + data_root = args.data_root + version = args.version or "v1.0-trainval" + test_scenes = flatten_args(args.scenes) or ( + DEFAULT_SCENES if defaults_on_none else None + ) + + if ann_file: + logger.info(f"Using annotation file: {ann_file}") + else: + logger.info(f"Data root: {data_root}") + logger.info(f"Dataset version: {version}") + logger.info(f"Processing test scenes: {test_scenes or 'ALL'}") + + dataset = NuScenesDataset( + data_root=data_root, + version=version, + ann_file=ann_file, + test_scenes=test_scenes, + ) + + if len(dataset) == 0 and not defaults_on_none: + logger.info( + "No data found in the dataset. Please check if the scene name is correct." + ) + sys.exit(1) + + # Simple DataLoader replacement + class SimpleDataLoader: + def __init__(self, dataset): + self.dataset = dataset + + def __iter__(self): + for i in range(len(self.dataset)): + yield [self.dataset[i]] + + def __len__(self): + return len(self.dataset) + + data_loader = SimpleDataLoader(dataset) + + # Determine results file path + if args.visualize: + # Visualize mode: skip inference, use provided HDF5 file + results_file = args.visualize + else: + # Inference mode: run process_all_frames and save to results_file + results_file = args.results_file + with h5py.File(results_file, "w") as h5f: + logger.info(f"All frames streamed to: {results_file}") + + score_thresh = 0.4 + filter_score_thresh = 0.35 + miss_tolerance = 5 + track_base = RuntimeTrackerBase( + score_thresh=score_thresh, + filter_score_thresh=filter_score_thresh, + miss_tolerance=miss_tolerance, + ) + process_all_frames(models, data_loader, track_base, h5f) + + # Visualize results + visualize_results( + results_file=results_file, nusc=dataset.nusc, output_dir="vis_output" + ) + + logger.info("Script finished successfully.") + + +def main(): + # Skip model initialization if only visualizing + if args.visualize: + visualize_results(args.visualize) + return + + # model files check and download + check_and_download_models(WEIGHT_BEV_ENC_PATH, MODEL_BEV_ENC_PATH, REMOTE_PATH) + check_and_download_models( + WEIGHT_TRACK_HEAD_PATH, MODEL_TRACK_HEAD_PATH, REMOTE_PATH + ) + check_and_download_models( + WEIGHT_MEMORY_BANK_PATH, MODEL_MEMORY_BANK_PATH, REMOTE_PATH + ) + check_and_download_models( + WEIGHT_MEMORY_BANK_UPD_PATH, MODEL_MEMORY_BANK_UPD_PATH, REMOTE_PATH + ) + check_and_download_models( + WEIGHT_QUERY_INTERACTION_PATH, MODEL_QUERY_INTERACTION_PATH, REMOTE_PATH + ) + check_and_download_models(WEIGHT_SEG_HEAD_PATH, MODEL_SEG_HEAD_PATH, REMOTE_PATH) + check_and_download_models( + WEIGHT_MOTION_HEAD_PATH, MODEL_MOTION_HEAD_PATH, REMOTE_PATH + ) + check_and_download_models(WEIGHT_OCC_HEAD_PATH, MODEL_OCC_HEAD_PATH, REMOTE_PATH) + check_and_download_models( + WEIGHT_PLANNING_HEAD_PATH, MODEL_PLANNING_HEAD_PATH, REMOTE_PATH + ) + check_and_download_file(QUERY_EMBEDDING_FILE, REMOTE_PATH) + check_and_download_file(REFERENCE_POINTS_WEIGHT_FILE, REMOTE_PATH) + check_and_download_file(REFERENCE_POINTS_BIAS_FILE, REMOTE_PATH) + + env_id = args.env_id + + # initialize + if not args.onnx: + bev_encoder = ailia.Net(MODEL_BEV_ENC_PATH, WEIGHT_BEV_ENC_PATH, env_id=env_id) + track_head = ailia.Net( + MODEL_TRACK_HEAD_PATH, WEIGHT_TRACK_HEAD_PATH, env_id=env_id + ) + memory_bank = ailia.Net( + MODEL_MEMORY_BANK_PATH, WEIGHT_MEMORY_BANK_PATH, env_id=env_id + ) + memory_bank_update = ailia.Net( + MODEL_MEMORY_BANK_UPD_PATH, WEIGHT_MEMORY_BANK_UPD_PATH, env_id=env_id + ) + query_interact = ailia.Net( + MODEL_QUERY_INTERACTION_PATH, WEIGHT_QUERY_INTERACTION_PATH, env_id=env_id + ) + seg_head = ailia.Net(MODEL_SEG_HEAD_PATH, WEIGHT_SEG_HEAD_PATH, env_id=env_id) + motion_head = ailia.Net( + MODEL_MOTION_HEAD_PATH, WEIGHT_MOTION_HEAD_PATH, env_id=env_id + ) + occ_head = ailia.Net(MODEL_OCC_HEAD_PATH, WEIGHT_OCC_HEAD_PATH, env_id=env_id) + planning_head = ailia.Net( + MODEL_PLANNING_HEAD_PATH, WEIGHT_PLANNING_HEAD_PATH, env_id=env_id + ) + else: + import onnxruntime + + providers = ["CUDAExecutionProvider", "CPUExecutionProvider"] + bev_encoder = onnxruntime.InferenceSession( + WEIGHT_BEV_ENC_PATH, providers=providers + ) + track_head = onnxruntime.InferenceSession( + WEIGHT_TRACK_HEAD_PATH, providers=providers + ) + memory_bank = onnxruntime.InferenceSession( + WEIGHT_MEMORY_BANK_PATH, providers=providers + ) + memory_bank_update = onnxruntime.InferenceSession( + WEIGHT_MEMORY_BANK_UPD_PATH, providers=providers + ) + query_interact = onnxruntime.InferenceSession( + WEIGHT_QUERY_INTERACTION_PATH, providers=providers + ) + seg_head = onnxruntime.InferenceSession( + WEIGHT_SEG_HEAD_PATH, providers=providers + ) + motion_head = onnxruntime.InferenceSession( + WEIGHT_MOTION_HEAD_PATH, providers=providers + ) + occ_head = onnxruntime.InferenceSession( + WEIGHT_OCC_HEAD_PATH, providers=providers + ) + planning_head = onnxruntime.InferenceSession( + WEIGHT_PLANNING_HEAD_PATH, providers=providers + ) + + models = dict( + bev_encoder=bev_encoder, + track_head=track_head, + memory_bank=memory_bank, + memory_bank_update=memory_bank_update, + query_interact=query_interact, + seg_head=seg_head, + motion_head=motion_head, + occ_head=occ_head, + planning_head=planning_head, + ) + recognize_from_image(models) + + +if __name__ == "__main__": + main() diff --git a/autonomous_driving/uniad/vis_output/.gitignore b/autonomous_driving/uniad/vis_output/.gitignore new file mode 100644 index 000000000..4bc404475 --- /dev/null +++ b/autonomous_driving/uniad/vis_output/.gitignore @@ -0,0 +1 @@ +*.jpg \ No newline at end of file diff --git a/scripts/download_all_models.sh b/scripts/download_all_models.sh index 8cb2c88b6..28189a34e 100755 --- a/scripts/download_all_models.sh +++ b/scripts/download_all_models.sh @@ -47,6 +47,7 @@ cd ../../audio_processing/pytorch_wavenet/; python3 pytorch_wavenet.py ${OPTION} cd ../../audio_processing/audiosep/; python3 audiosep.py ${OPTION} cd ../../audio_processing/cosyvoice2/; python3 cosyvoice2.py ${OPTION} cd ../../audio_processing/sensevoice/; python3 sensevoice.py ${OPTION} +#cd ../../autonomous_driving/uniad/; python3 uniad.py ${OPTION} cd ../../background_ramoval/deep-image-matting; python3 deep-image-matting.py ${OPTION} cd ../../background_ramoval/u2net; python3 u2net.py ${OPTION} cd ../../background_ramoval/u2net; python3 u2net.py -a small ${OPTION}