Handle Event Queue Overflow #1048

Jamidd · 2024-08-13T15:45:35Z

This PR aims to prevent overflow in the event queue. When the number of available task slots drops to less than a fifth of the original capacity, ticker events are discarded. If more than 30 ticker events are discarded, the OVMS will be restarted, as this indicates that too much time has passed since the last ticker event was processed.

dexterbg · 2024-08-30T12:32:28Z

Jaime,

the PR again mixes two completely independent changes, besides the queue checking it includes an undocumented and potentially hazardous change to the event task stack size, raising it from 8K to 12K.

Any substantial new stack requirements need to be discussed first, best on the developer list. Stack is allocated from the internal 8 bit RAM segment, which is rather tight already. Depending on the vehicle, there will be additional tasks needing space, and less free 8 bit RAM can make the system less stable in a situation of multiple concurrent command tasks being executed (e.g. via API calls).

So there must be a good & valid reason to generally increase a task stack size. The event task runs all the event handling, thus already has a rather large stack of 8K. What caused a stack overflow for you, did you verify it was in the event task, and how can we reproduce this?

If the situation is setup/config specific (e.g. only occurring when using an MQTT server), instead of a general RAM hit, we should make the event stack size configurable.

On the actual PR topic:

Please try to avoid typos in system messaged ("droped"), and to be very precise about what is happening, as this is info for developers: "Timer service / ticker timer has died" isn't actually correct. The timer service & tickers are still working perfectly here, the issue is the event task needing too much time processing events.

And that leads to a potentially much more useful addition idea: how about also collecting statistics on which event handlers are the culprits? The event task could log suspiciously long event handling runs, and it could gather average and max handling times, so developers can take a look at which handler needs optimization / fixing.

I added some info on the currently being handled event for the crash aftermath (see m_current_started), but that will only tell about the last handler before the crash. General performance statistics could be very helpful in tracking down issues before they become fatal. Take @frogonwheels approach for the OBD performance stats as an example.

Regards,
Michael

Jamidd added 3 commits August 12, 2024 17:52

avoid task overflow

dcd23fd

fix style

d0b625f

make variable detect_event_loop_blockage local

f7c0c2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Event Queue Overflow #1048

Handle Event Queue Overflow #1048

Jamidd commented Aug 13, 2024

dexterbg commented Aug 30, 2024

Handle Event Queue Overflow #1048

Are you sure you want to change the base?

Handle Event Queue Overflow #1048

Conversation

Jamidd commented Aug 13, 2024

dexterbg commented Aug 30, 2024