You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NRI plugins running in containerd by default have 2 seconds per event to provide a response. This is fine. But, if it misses a single response in that timeframe, it is closed / cut off from future events. For plugins built on github.com/containerd/nri, that results in the process exiting.
There is a specific set of errors that induce this close-the-connection behavior:
// isFatalError returns true if the error is fatal and the plugin connection should be closed.
funcisFatalError(errerror) bool {
switch {
caseerrors.Is(err, ttrpc.ErrClosed):
returntrue
caseerrors.Is(err, ttrpc.ErrServerClosed):
returntrue
caseerrors.Is(err, ttrpc.ErrProtocol):
returntrue
caseerrors.Is(err, context.DeadlineExceeded):
returntrue
}
returnfalse
}
The other ones in that list look very reasonable. But, I'd like to suggest that a plugin responding to one event in more than (by default) 2 seconds doesn't indicate that the plugin has entirely failed and it can probably still be used for future events, so a better behavior would be to simply time out that one event but continue.
The text was updated successfully, but these errors were encountered:
Description
NRI plugins running in
containerd
by default have 2 seconds per event to provide a response. This is fine. But, if it misses a single response in that timeframe, it is closed / cut off from future events. For plugins built on github.com/containerd/nri, that results in the process exiting.There is a specific set of errors that induce this close-the-connection behavior:
nri/pkg/adaptation/plugin.go
Lines 520 to 533 in 7b3bcee
The other ones in that list look very reasonable. But, I'd like to suggest that a plugin responding to one event in more than (by default) 2 seconds doesn't indicate that the plugin has entirely failed and it can probably still be used for future events, so a better behavior would be to simply time out that one event but continue.
The text was updated successfully, but these errors were encountered: