Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations #119

Open
Schwo0ps opened this issue Jul 31, 2019 · 2 comments
Open

Comments

@Schwo0ps
Copy link

The issue is a complicated one, but here goes.

I first noticed this issue when I saw that diagnostics messages from arm64 machines sometimes arrive, but only infrequently (between 0% and 10% of the time), and eventually the following message comes from the Diagnostic Aggregator running on amd64 and it appears all messages are dropped.

[ERROR] [1564529964.093868659 /diag_agg] [/tmp/binarydeb/ros-kinetic-roscpp-1.12.14/src/libros/transport_publisher_link.cpp:TransportPublisherLink::onMessageLength:175]: a message of over a gigabyte was predicted in tcpros. that seems highly unlikely, so I'll assume protocol synchronization is lost.

At first, I though it was Endianness, but all machines are Little Endian. There are also C++ nodes which are able to communicate with each other properly on all machines. It also appears as though this only happens with the Diagnostic Updater, not any other topic.

After this, I started running a test: just running roscore and a single test node with the following code.

#include <ros/ros.h>
#include <diagnostic_updater/diagnostic_updater.h>
#include <diagnostic_updater/update_functions.h>

int main(int argc, char** argv)
{
  ros::init(argc, argv, "updater_node");
  double rate = 10.;
  diagnostic_updater::Updater updater;
  diagnostic_updater::FrequencyStatus frequency_status(
    diagnostic_updater::FrequencyStatusParam(&rate, &rate)
  );
  updater.setHardwareID("none");
  updater.add(frequency_status);

  ros::Rate r(rate);
  while (ros::ok())
  {
    frequency_status.tick();
    updater.update();
    ros::spinOnce();
    r.sleep();
  }
}

Everything works fine with amd64, but on arm64, the above issues happen. Additionally, the node just eventually crashes with std::bad_alloc. Here is the backtrace and relevant message that was published (as it looks like the error was with serialization)

#13 0x0000000000419084 in diagnostic_updater::Updater::publish (this=this@entry=0x7fffffecc0, status_vec=std::vector of length 1, capacity 1 = {...})
    at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:547
547	        publisher_.publish(msg);
(gdb) list
542	            node_name_.substr(1) + std::string(": ") + iter->name;
543	        }
544	        diagnostic_msgs::DiagnosticArray msg;
545	        msg.status = status_vec;
546	        msg.header.stamp = ros::Time::now(); // Add timestamp for ROS 0.10
547	        publisher_.publish(msg);
548	      }
549	
550	      /**
551	       * Publishes on /diagnostics and reads the diagnostic_period parameter.
(gdb) p msg
$7 = {header = {seq = 0, stamp = {<ros::TimeBase<ros::Time, ros::Duration>> = {sec = 1564612119, nsec = 323701618}, <No data fields>}, frame_id = ""}, status = std::vector of length 1, capacity 1 = {{level = 0 '\000', name = "updater_node_1564612117153179612: Frequency Status", message = "Desired frequency met", hardware_id = "none", 
      values = std::vector of length 7, capacity 7 = {{key = "Events in window", value = "22"}, {key = "Events since startup", value = "22"}, {key = "Duration of window (s)", value = "2.100290"}, {key = "Actual frequency (Hz)", value = "10.474742"}, {key = "Target frequency (Hz)", value = "10.000000"}, {
          key = "Minimum acceptable frequency (Hz)", value = "9.000000"}, {key = "Maximum acceptable frequency (Hz)", value = "11.000000"}}}}}
(gdb) bt
#0  0x0000007fb7a04528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x0000007fb7a059e0 in __GI_abort () at abort.c:89
#2  0x0000007fb7bde254 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#3  0x0000007fb7bdbdc4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#4  0x0000007fb7bdbe10 in std::terminate() () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#5  0x0000007fb7bdc0d4 in __cxa_throw () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#6  0x0000007fb7bdc6d8 in operator new(unsigned long) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#7  0x000000000040f840 in ros::serialization::serializeMessage<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > > (message=...) at /opt/ros/kinetic/include/ros/serialization.h:795
#8  0x000000000040d60c in boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> >::operator()<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list0> (f=<optimized out>, a=<synthetic pointer>..., 
    this=<optimized out>) at /usr/include/boost/function/function_template.hpp:129
#9  boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> > >::operator() (this=<optimized out>) at /usr/include/boost/bind/bind.hpp:893
#10 boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> > >, ros::SerializedMessage>::invoke (
    function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:138
#11 0x0000007fb7efc8b0 in ros::TopicManager::publish(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<ros::SerializedMessage ()> const&, ros::SerializedMessage&) () from /opt/ros/kinetic/lib/libroscpp.so
#12 0x000000000040fe14 in ros::Publisher::publish<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > > (this=this@entry=0x7fffffee88, message=...) at /usr/include/c++/8/new:169
#13 0x0000000000419084 in diagnostic_updater::Updater::publish (this=this@entry=0x7fffffecc0, status_vec=std::vector of length 1, capacity 1 = {...}) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:547
#14 0x000000000040c5ac in diagnostic_updater::Updater::force_update (this=0x7fffffecc0) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:440
#15 diagnostic_updater::Updater::update (this=0x7fffffecc0) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:390
#16 main (argc=<optimized out>, argv=<optimized out>) at /ws/src/test/src/updater_node.cpp:20

It crashes after ~30 seconds but seems to do so more quickly if multiple of the nodes are running.

As specified in the title, this only happens with GCC optimization level 3 (compiling with -O3 or CMAKE_BUILD_TYPE=Release). I tried with -O2 and it appears to work fine.

I doubt this can be easily fixed and I'm not 100% sure if the issue is within this repo or ros_comm, but any ideas would be greatly appreciated. We really would like to use -O3 throughout our code for improved performance.

@Ecophagy
Copy link

Ecophagy commented Feb 1, 2022

Bit of a necro, but this looks like a gcc bug, the same as ros/ros_comm#2197 & ros/roscpp_core#130

If you need to use -O3 then I think your only option is to update your version of gcc.

@peci1
Copy link

peci1 commented Jun 6, 2023

PR ros/roscpp_core#136 should fix this. I'm looking for someone who could verify. Just please notice that Focal now has GCC 9.4 by default where I could not reproduce the issue. So the test would need to be done with GCC 9.3 installed explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants