Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add note about long running callbacks #218

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sfalexrog
Copy link
Member

Having a long running callback will result in messages being handled with a significant delay, regardless of the queue_size option. This is especially noticeable for image processing. We should at the very least document that.

This P/R adds a note about using separate threads for image processing, as well as some minimal sample code.

Copy link
Member

@okalachev okalachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I'll still prefer someone spending more time to find more correct solution for this issue, without "dirtying" the user code.

Second, I see no reason for using Queue here. Event is totally enough. This works perfectly:

import rospy
from sensor_msgs.msg import Image
import threading

rospy.init_node('computer_vision_sample')
e = threading.Event()

def image_callback(data):
    global latest_img
    latest_img = data
    e.set()

image_sub = rospy.Subscriber('main_camera/image_raw', Image, image_callback, queue_size=1, buff_size=999999)

while not rospy.is_shutdown():
    e.wait()
    e.clear()
    print(latest_img.header.stamp.to_sec(), rospy.get_rostime().to_sec())

Finally, I don't like keeping several ways of wrapping the user code for processing the image. This confuses user. We should keep only one, "correct" way (the "fixed" version for now).

@sfalexrog
Copy link
Member Author

Okay, I believe I see your point.

Using a Queue definitely sounds like overkill, although I do like it being a container - that way we don't need two entities for synchronization.

Regarding your example: it does look simplier, but I still have some issues with it:

  • latest_img is only created in the callback. This makes it much less discoverable (I don't think a lot of our users know that global keyword can create global variables, not simply reference them; I don't believe this is something that is taught in the introductory Python course);
  • there is no explicit thread creation; this makes the code seem simplier (and I've thought about doing the same, since the callbacks already arrive in a separate thread), but, well, "explicit is better than implicit".

As for your last point, well, the "best" way would be to do something similar to how OpenCV does that. Most people doing CV-related stuff have already learned some patterns, and this is where your example comes close: having a "main loop" with all image processing is nice (although that would probably lead to mixing image processing code and flight code, which I believe should be discouraged).

Actually, your last point brings up our docs' overall deficiency: we don't have a preferred way of writing complex flight code.

@copterspace
Copy link
Contributor

К слову: у нас больше проблем с производительностью обработки картинок вызывал не threading, а то, что на питоне не работает image_transport (http://wiki.ros.org/image_transport#Python_Usage), что вызывает задержку до 0.1 сек при прокачке кадра через рос топик.
Спасало прямое использование OpenCv VideoCapture...
Возможно, ещё какие способы решения этой проблемы известны?...

@okalachev
Copy link
Member

I investigated the issue a little bit more. The "problem" is actually in this code: https://github.com/ros/ros_comm/blob/noetic-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L795.

The client code does correct thing cutting the latest messages from the buffer (https://github.com/ros/ros_comm/blob/noetic-devel/clients/rospy/src/rospy/msg.py#L225) to process.

But since reading the socket and calling callbacks perform in the same thread (at least for one TCPROSTransport, which apparently means one subscriber+publisher pair), callback function blocks the thread from reading the socket for the certain time. It's OK for small messages, but for large messages (images) it simply overflows the internal OS TCP socket buffer (https://stackoverflow.com/questions/3205953/how-does-a-linux-socket-buffer-overflow). Therefore, the latest packets get dropped by the kernel.

By the way, Linux TCP buffer size can be inspected in /proc/sys/net/ipv4/tcp_rmem, and also can be changed via setsockopt.

@okalachev
Copy link
Member

okalachev commented Feb 26, 2020

@copterspace, сомневаюсь, что дело именно в большом времени передачи изображения, а не в той проблеме, которую мы обсуждаем.

Когда я получаю изображения с камеры с помощью воркэраунда выше, то измеряемое время передачи на порядки меньше, чем 0.1с.

@okalachev
Copy link
Member

Actually, I would call that a bug, which can be simply avoided using separate threads for reading the socket and invoking the callbacks. It deserves issuing and fixing in rospy.

@sfalexrog
Copy link
Member Author

Opened an issue in ros_comm: ros/ros_comm#1901
Test setup is available at https://github.com/sfalexrog/staleness_test

okalachev added a commit that referenced this pull request Nov 2, 2022
Clover package Python library added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants