Skip to content

An error occurred when the SMOKE model performed an equality operation #15710

@hjlhhh-eng

Description

@hjlhhh-eng

We appreciate you go through Apollo documentations and search previous issues before creating an new one. If neither of the sources helped you with your issues, please report the issue using the following form. Please note missing info can delay the response time.

System information

-Linux Ubuntu 20.04.06

  • Apollo 8.0.0
  • Carla 0.9.14

Steps to reproduce the issue:

After bridging CARLA and Apollo, I followed the official steps to launch the Transform module and image_decompress. Then, I started the obstacle detection module using the command:
mainboard -d modules/perception/production/dag/dag_streaming_perception_camera.dag
An error occurred during the forward process, with the following message:
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0701 11:12:58.536051 9401 module_argument.cc:81] []command: mainboard -d modules/perception/production/dag/dag_streaming_perception_camera.dag
I0701 11:12:58.536531 9401 global_data.cc:153] []host ip: 192.168.0.239
I0701 11:12:58.539268 9401 module_argument.cc:57] []binary_name_ is mainboard, process_group_ is mainboard_default, has 1 dag conf
I0701 11:12:58.539281 9401 module_argument.cc:60] []dag_conf: modules/perception/production/dag/dag_streaming_perception_camera.dag
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: invalid argument
Exception raised from launch_vectorized_kernel at /apollo/pytorch.git/aten/src/ATen/native/cuda/CUDALoops.cuh:119 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f834116e24b in /usr/local/libtorch_gpu/lib/libc10.so)
frame #1: void at::native::gpu_kernel_impl<at::native::CompareEqFunctor >(at::TensorIterator&, at::native::CompareEqFunctor const&) + 0xab2 (0x7f8300a505a2 in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #2: void at::native::gpu_kernel<at::native::CompareEqFunctor >(at::TensorIterator&, at::native::CompareEqFunctor const&) + 0x153 (0x7f8300a520d3 in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #3: void at::native::gpu_kernel_with_scalars<at::native::CompareEqFunctor >(at::TensorIterator&, at::native::CompareEqFunctor const&) + 0x74 (0x7f8300a522a4 in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #4: at::native::eq_kernel_cuda(at::TensorIterator&) + 0x11c (0x7f8300a2b8fc in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #5: + 0x9e32fc (0x7f833a4a92fc in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #6: + 0x2c0b450 (0x7f83020e6450 in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #7: at::Tensor& c10::Dispatcher::callWithDispatchKey<at::Tensor&, at::Tensor&, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor& (at::Tensor&, at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKey, at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0xaf (0x7f833acb640f in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #8: + 0x9de8c8 (0x7f833a4a48c8 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #9: at::native::eq(at::Tensor const&, at::Tensor const&) + 0x24 (0x7f833a4969f4 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #10: at::native::cuda_equal(at::Tensor const&, at::Tensor const&) + 0x13f (0x7f83012b4faf in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #11: + 0x2c0fbd5 (0x7f83020eabd5 in /usr/local/libtorch_gpu/lib/libtorch_cuda.so)
frame #12: bool c10::Dispatcher::callWithDispatchKey<bool, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<bool (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKey, at::Tensor const&, at::Tensor const&) const + 0x99 (0x7f833acd45d9 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #13: + 0x267e1ce (0x7f833c1441ce in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #14: bool c10::Dispatcher::callWithDispatchKey<bool, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<bool (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKey, at::Tensor const&, at::Tensor const&) const + 0x99 (0x7f833acd45d9 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #15: + 0x2ba29ea (0x7f833c6689ea in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #16: + 0x2ba52c8 (0x7f833c66b2c8 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #17: torch::jit::EqualNode::operator()(torch::jit::Node const*, torch::jit::Node const*) const + 0x2c9 (0x7f833c66bed9 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #18: std::pair<std::__detail::_Node_iterator<torch::jit::Node*, true, true>, bool> std::_Hashtable<torch::jit::Node*, torch::jit::Node*, std::allocatortorch::jit::Node*, std::__detail::_Identity, torch::jit::EqualNode, torch::jit::HashNode, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<torch::jit::Node* const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<torch::jit::Node*, true> > > >(torch::jit::Node* const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<torch::jit::Node*, true> > > const&, std::integral_constant<bool, true>) + 0x75 (0x7f833c6b0875 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #19: + 0x2be8da3 (0x7f833c6aeda3 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #20: torch::jit::ConstantPooling(std::shared_ptrtorch::jit::Graph const&) + 0xd5 (0x7f833c6b0665 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #21: + 0x2a79c4b (0x7f833c53fc4b in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #22: torch::jit::inlineCallTo(torch::jit::Node*, torch::jit::Function*, bool) + 0x102 (0x7f833c65cd12 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #23: + 0x2c5eff3 (0x7f833c724ff3 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #24: torch::jit::Inline(torch::jit::Graph&) + 0x415 (0x7f833c727485 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #25: torch::jit::preoptimizeGraph(std::shared_ptrtorch::jit::Graph&) + 0xc (0x7f833c53da1c in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #26: + 0x2a79c4b (0x7f833c53fc4b in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #27: torch::jit::inlineCallTo(torch::jit::Node*, torch::jit::Function*, bool) + 0x102 (0x7f833c65cd12 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #28: + 0x2c5eff3 (0x7f833c724ff3 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #29: torch::jit::Inline(torch::jit::Graph&) + 0x415 (0x7f833c727485 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #30: torch::jit::preoptimizeGraph(std::shared_ptrtorch::jit::Graph&) + 0xc (0x7f833c53da1c in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #31: + 0x2a79c4b (0x7f833c53fc4b in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #32: + 0x2a79639 (0x7f833c53f639 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #33: torch::jit::GraphFunction::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) + 0xa (0x7f833c53d69a in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #34: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) + 0x3a (0x7f833c53da7a in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #35: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) + 0x165 (0x7f833c54d6f5 in /usr/local/libtorch_gpu/lib/libtorch_cpu.so)
frame #36: torch::jit::Module::forward(std::vector<c10::IValue, std::allocatorc10::IValue >) + 0xff (0x7f842692e37f in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #37: apollo::perception::inference::ObstacleDetector::Infer() + 0x733 (0x7f8426b43413 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #38: apollo::perception::camera::SmokeObstacleDetector::Process(apollo::perception::pipeline::DataFrame*) + 0xbe7 (0x7f84266f635d in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #39: apollo::perception::pipeline::Pipeline::InnerProcess(apollo::perception::pipeline::DataFrame*) + 0xf3 (0x7f84266d0b25 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #40: apollo::perception::camera::ObstacleDetectionCamera::Process(apollo::perception::pipeline::DataFrame*) + 0x147 (0x7f84266844af in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #41: apollo::perception::onboard::CameraObstacleDetectionComponent::InternalProc(std::shared_ptr<apollo::drivers::Image const> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, apollo::common::ErrorCode*, apollo::perception::onboard::SensorFrameMessage*, apollo::perception::PerceptionObstacles*) + 0x7c5 (0x7f84263718c5 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #42: apollo::perception::onboard::CameraObstacleDetectionComponent::OnReceiveImage(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xa7e (0x7f842636d564 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #43: void std::__invoke_impl<void, void (apollo::perception::onboard::CameraObstacleDetectionComponent::&)(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), apollo::perception::onboard::CameraObstacleDetectionComponent&, std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&>(std::__invoke_memfun_deref, void (apollo::perception::onboard::CameraObstacleDetectionComponent::&)(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), apollo::perception::onboard::CameraObstacleDetectionComponent&, std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) + 0x95 (0x7f84263e0d24 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #44: std::__invoke_result<void (apollo::perception::onboard::CameraObstacleDetectionComponent::&)(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), apollo::perception::onboard::CameraObstacleDetectionComponent&, std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&>::type std::__invoke<void (apollo::perception::onboard::CameraObstacleDetectionComponent::&)(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), apollo::perception::onboard::CameraObstacleDetectionComponent&, std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&>(void (apollo::perception::onboard::CameraObstacleDetectionComponent::&)(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), apollo::perception::onboard::CameraObstacleDetectionComponent&, std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) + 0x6f (0x7f84263d60b3 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #45: void std::_Bind<void (apollo::perception::onboard::CameraObstacleDetectionComponent::(apollo::perception::onboard::CameraObstacleDetectionComponent, std::_Placeholder<1>, std::__cxx11::basic_string<char, std::char_traits, std::allocator >))(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)>::__call<void, std::shared_ptrapollo::drivers::Image const&, 0ul, 1ul, 2ul>(std::tuple<std::shared_ptrapollo::drivers::Image const&>&&, std::_Index_tuple<0ul, 1ul, 2ul>) + 0xa0 (0x7f84263c889a in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #46: void std::_Bind<void (apollo::perception::onboard::CameraObstacleDetectionComponent::(apollo::perception::onboard::CameraObstacleDetectionComponent, std::_Placeholder<1>, std::__cxx11::basic_string<char, std::char_traits, std::allocator >))(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)>::operator()<std::shared_ptrapollo::drivers::Image const&, void>(std::shared_ptrapollo::drivers::Image const&) + 0x47 (0x7f84263b718f in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #47: std::_Function_handler<void (std::shared_ptrapollo::drivers::Image const&), std::_Bind<void (apollo::perception::onboard::CameraObstacleDetectionComponent::(apollo::perception::onboard::CameraObstacleDetectionComponent, std::_Placeholder<1>, std::__cxx11::basic_string<char, std::char_traits, std::allocator >))(std::shared_ptrapollo::drivers::Image const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)> >::_M_invoke(std::_Any_data const&, std::shared_ptrapollo::drivers::Image const&) + 0x37 (0x7f84263a5bf8 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #48: std::function<void (std::shared_ptrapollo::drivers::Image const&)>::operator()(std::shared_ptrapollo::drivers::Image const&) const + 0x49 (0x7f84263d64a1 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #49: apollo::cyber::Readerapollo::drivers::Image::Init()::{lambda(std::shared_ptrapollo::drivers::Image const&)#1}::operator()(std::shared_ptrapollo::drivers::Image const&) const + 0x51 (0x7f84263c8e5f in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #50: std::_Function_handler<void (std::shared_ptrapollo::drivers::Image const&), apollo::cyber::Readerapollo::drivers::Image::Init()::{lambda(std::shared_ptrapollo::drivers::Image const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptrapollo::drivers::Image const&) + 0x37 (0x7f84263eb051 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #51: std::function<void (std::shared_ptrapollo::drivers::Image const&)>::operator()(std::shared_ptrapollo::drivers::Image const&) const + 0x49 (0x7f84263d64a1 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #52: apollo::cyber::croutine::CreateRoutineFactory<apollo::drivers::Image, std::function<void (std::shared_ptrapollo::drivers::Image const&)> >(std::function<void (std::shared_ptrapollo::drivers::Image const&)>&&, std::shared_ptr<apollo::cyber::data::DataVisitor<apollo::drivers::Image, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType> > const&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const + 0x74 (0x7f84263d6726 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #53: std::_Function_handler<void (), apollo::cyber::croutine::CreateRoutineFactory<apollo::drivers::Image, std::function<void (std::shared_ptrapollo::drivers::Image const&)> >(std::function<void (std::shared_ptrapollo::drivers::Image const&)>&&, std::shared_ptr<apollo::cyber::data::DataVisitor<apollo::drivers::Image, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType> > const&)::{lambda()#1}::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) + 0x20 (0x7f84264058e2 in /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so)
frame #54: std::function<void ()>::operator()() const + 0x32 (0x55e777106666 in mainboard)
frame #55: apollo::cyber::croutine::CRoutine::Run() + 0x1c (0x7f847ce0c1ec in /apollo/.cache/bazel/540135163923dd7d5820f3ee4b306b32/execroot/apollo/bazel-out/k8-dbg/bin/cyber/mainboard/../../_solib_local/_U_S_Scyber_Ccyber_Ucore___Ucyber/libcyber_core.so)
frame #56: + 0xc2f935 (0x7f847ce0b935 in /apollo/.cache/bazel/540135163923dd7d5820f3ee4b306b32/execroot/apollo/bazel-out/k8-dbg/bin/cyber/mainboard/../../_solib_local/_U_S_Scyber_Ccyber_Ucore___Ucyber/libcyber_core.so)

Aborted (core dumped)

May I ask whether the issue is with my data (the driving scenario information from CARLA, which has been successfully sent to Apollo's channel), or is it a problem with the model itself?

In addition, I have disabled the obstacle detection information provided by the bridge, as I want to rely on Apollo's own perception module. I look forward to your response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions