-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeswitch packet relay metrics #2292
base: master
Are you sure you want to change the base?
Conversation
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1544/index.html |
@@ -57,6 +57,7 @@ typedef struct { | |||
char body[SWITCH_RTP_MAX_BUF_LEN+4+sizeof(char *)]; | |||
switch_rtp_hdr_ext_t *ext; | |||
char *ebody; | |||
switch_time_t received_ts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be here, nor in the rtp_msg_t . If you want to to add the received timestamp to the JB you will have to create a different struct with both switch_rtp_packet_t and switch_time_t in it.
Expose it in switch_rtp.h and use it for switch_jb_put_packet() and switch_jb_get_packet() (at least) .
eg:
typedef struct {
switch_rtp_packet_t pkt;
switch_time_t rcvd_time;
} switch_rtp_incoming_pkt_t;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, I wanted to measure from the input to the output socket to be sure that the measurements are taken as close as possible to the input and output.
Do I understand that adding this at the packet level is to intrusive and measuring only at the jitter buffer in/out would be preferable ?
This approach would probably require a bit more code modification since some functions parameters will
have to be modified to support this new struct.
Thanks for the feedback i will take some time to consider and look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll need a rcvd_time in the switch_jb_node_t too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the sizeof() of those structs (rtp_msg_t or switch_rtp_packet_t) - better to keep them as they are in master branch now.
@@ -170,6 +170,8 @@ struct switch_core_session { | |||
switch_buffer_t *text_line_buffer; | |||
switch_mutex_t *text_mutex; | |||
const char *external_id; | |||
|
|||
packet_stats_t stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be here. We have RTP stats on the session through media_handle -> RTP engines.
See switch_rtp_stats_t stats in the rtp_session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, will look at it
@@ -87,6 +87,7 @@ typedef struct switch_frame_geometry { | |||
payload_map_t *pmap; | |||
switch_image_t *img; | |||
struct switch_frame_geometry geometry; | |||
switch_time_t received_ts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a timestamp on the frame. Do we really need another ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will look at that other timestamp to see if it is timestamped at the extraction of the frame from the packet or at the reception of the packet like the one I am using, even both should be the same most of the time.
Looking that the packet relay metrics, I found a problem with the EJB.
I will search the root, cause, maybe packet sequence id roll-over is not well handled ... However, a lot of load tests were conducted with random call duration, not sure why this would not been triggered ... |
I am load testing while forcing a lot of elasticity
I can not believe I will not be able to reproduce on calls that can last up to 50 minutes. |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1572/index.html |
could you do unit-tests upon this work ? |
sure, what are we using for unit tests ? (edit, I just saw the unit test task, I will look at it) |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1583/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1588/index.html |
I think I found the problem, thanks to this report.
seems obvious that one of the value here or something can be wrong .
I will add some logging and protection here
But I think I see it, very large burst or long lasting PLC can create that ! Note to myself : double check that PLC packets are not wrongly computed in terms of delay. |
Unit-tests compilation failed: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1590/unit-tests-build-result.txt |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1590/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1591/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1592/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1594/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1595/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1616/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1620/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1627/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1628/index.html |
Unit-tests compilation failed: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1638/unit-tests-build-result.txt |
Scan-build compilation failed: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1638/scan-build-result.txt |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1639/index.html |
- add buffering skip counter - jitter buffer reset when too many missed frames - jb disable when facing too many consecutive missing frames - fix check_jb_size while expanding - disable non elactic skip due to buffering - another check size roll-over issue - reset when expanding above the maximum size - buffer size management - proper handling of seq roll-over and fix seq decrement - force accelerate when above the max size
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1646/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1681/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1682/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1683/index.html |
Scan-build found bugs: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1684/index.html |
As I am now using the elastic jitter buffer, I wanted precise metrics on the time packets are spending inside FS.
I made some modifications to have such reporting.
This is the artificial jitter I am using
sudo tc qdisc add dev ethx root netem delay 475ms 600ms
Example with a test with high jitter with/without the elastic jitter buffer activated.
With the elastic jitter buffer activated
Without the elastic jitter buffer activated
load tests
I did several load tests to measure the impact of the extra work load and nothing significant on a very small instance for example.
With packet stats
Without packet stats
I understand that some refactoring and modifications will be needed to consider merging this ...
but I wanted to get some feedback to see if you guys see the value of such feature.