-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward agent containerd events to shim and monitor task exits via event subscriptions #118
Conversation
I forgot to run the linter locally before pushing, will fix that before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also get some tests to ensure this works properly?
Yes, mentioned this in the PR description but I don't plan on merging this until I've gotten an integration test up and running, which may take a few days. Just getting any feedback on the implementation out of the way first. |
21e193f
to
4b99eef
Compare
Ref #24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see tests (as you mentioned in the overview), but other than that the code looks pretty reasonable to me.
6dfedff
to
d607eb9
Compare
Updated w/ unit tests. I also have an e2e test written in my fork of the repo, will update this PR with it once I have merged the prerequisite #134 |
I updated w/ the e2e test. It's a pretty sizable change on top of the original so it's a separate commit. The travis build is exiting with non-zero for reasons unknown to me at this time. When I run "make lint" on my own from the same commit it exits with 0. |
The travis build was failing due to it only pulling 50 commits of history, which broke our git validation command. Sam fixed this, I will pick up that fix when I squash/rebase before merging. |
On a high-level, this commit: 1. Adds an EventExchange (existing containerd concept) to both the agent and shim. This is used by both for all event publishes 2. Creates a new concept, eventbridge.Getter, which is used to forward all events published on the agent's exchange to the shim's exchange. The eventbridge.Getter is implemented as a TTRPC-based client/server that provides just one API, GetEvent. When called from the client, GetEvent will block until an event has been published on the server's exchange, at which time that event will be returned. A separate function, eventbridge.Attach, does that actual work of calling GetEvent in a loop and forwarding the events to the local exchange. This long-polling model is used because TTRPC does not support streaming at this time. Signed-off-by: Erik Sipsma <[email protected]>
agent/main.go
Outdated
) | ||
|
||
const ( | ||
defaultPort = 10789 | ||
defaultPort = 10789 | ||
defaultNamespace = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defined in namespaces.Default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed while squashing commits
This replaces the previous implementation that called GetStatus every second with a more general approach of subscribing to TaskExit events, which are published by the agent and then forwarded to the shim's event exchange via the exchange bridge. Signed-off-by: Erik Sipsma <[email protected]>
This is the first e2e test that requires being run in isolation, as being run on a host (or in a container) with multiple firecracker-containerd instances would result in interference. This thus adds a mechanism that allows buildkite to run each such isolated test in its own docker container. The mechanism chosen here is to name tests that require isolation with "_Isolated" at the end and add a test util similar to the existing one for snapshotter root tests that skips a test in an env var is not explicitly provided. There is a script invoked as part of buildkite that finds all tests with the "_Isolated" tag in their name and runs each one in its own docker container. This mechanism is minimally invasive to the actual test code itself while not creating massive amounts of complication in our test infra (such as other alternative methods like using build tags). Signed-off-by: Erik Sipsma <[email protected]>
Forward agent containerd events to shim and monitor task exits via event subscriptions
Forward agent containerd events to shim and monitor task exits via event subscriptions
Fixes #87
More details in full commit messages, but overall goal of the change is to forward all containerd events from the agent to the shim on the host and then use that to monitor task exits, at which time we can gracefully shut everything down.
Sending this out to get feedback on the implementation right away, but I won't merge until I've also written an integration test (which is likely going to entail addressing #114) and added unit tests for the new EventExchange code. That being said, the testing done so far has been manual. I verified that both killing a task (via
ctr kill
) and the task exiting on its own result in a task exit event being forwarded to and logged by containerd. The shim and VM exit as expected in both cases too.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.