Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NRI plugins penalized with death if they take 2 seconds #114

Open
mbaynton opened this issue Oct 5, 2024 · 0 comments
Open

NRI plugins penalized with death if they take 2 seconds #114

mbaynton opened this issue Oct 5, 2024 · 0 comments

Comments

@mbaynton
Copy link

mbaynton commented Oct 5, 2024

Description

NRI plugins running in containerd by default have 2 seconds per event to provide a response. This is fine. But, if it misses a single response in that timeframe, it is closed / cut off from future events. For plugins built on github.com/containerd/nri, that results in the process exiting.

There is a specific set of errors that induce this close-the-connection behavior:

// isFatalError returns true if the error is fatal and the plugin connection should be closed.
func isFatalError(err error) bool {
switch {
case errors.Is(err, ttrpc.ErrClosed):
return true
case errors.Is(err, ttrpc.ErrServerClosed):
return true
case errors.Is(err, ttrpc.ErrProtocol):
return true
case errors.Is(err, context.DeadlineExceeded):
return true
}
return false
}

The other ones in that list look very reasonable. But, I'd like to suggest that a plugin responding to one event in more than (by default) 2 seconds doesn't indicate that the plugin has entirely failed and it can probably still be used for future events, so a better behavior would be to simply time out that one event but continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant