Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Update Driver Registration #2459

Merged

Conversation

kelvinfan001
Copy link
Member

Record agent's systemd unit in each transaction and add D-Bus API option (for use by update drivers, e.g. Zincati) for serializing agent information into a JSON file in /run. Teach rpm-ostree status to read from this JSON file and at least be able to display its update driver's systemd unit for now.

Closes #1747.

@cgwalters
Copy link
Member

The first commit looks good - mind splitting that off to a separate PR we can merge now? And it should be straightforward to extend the change to test-misc-2.sh from the previous PR to also look for AGENT_SD_UNIT.

src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/app/rpmostree-builtin-status.cxx Outdated Show resolved Hide resolved
src/app/rpmostree-builtin-status.cxx Outdated Show resolved Hide resolved
@kelvinfan001 kelvinfan001 force-pushed the kfan-register-update-driver branch 2 times, most recently from 7bf4c69 to 3629809 Compare January 20, 2021 23:05
@kelvinfan001 kelvinfan001 changed the title [WIP] Allow Update Driver Registration Allow Update Driver Registration Jan 20, 2021
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good, and I think you've done an awesome job digging into a lot of the code here! I just have almost entirely optional stuff, basically seems OK to land to me; let's just do the const char* etc. type tweaks.

I would say though I'd like to do the DBus property path over having the client parse the file, but we can indeed do that later.

src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Show resolved Hide resolved
src/libpriv/rpmostree-core.h Outdated Show resolved Hide resolved
src/app/rpmostree-builtin-status.cxx Outdated Show resolved Hide resolved
src/daemon/rpmostreed-transaction-types.cxx Show resolved Hide resolved
@kelvinfan001
Copy link
Member Author

@cgwalters Thanks for the review! I've made all the suggested changes except #2459 (comment). It seems to me that a key-value dictionary would be easier to use than a tuple if there were many fields in the driver info. Though my assumption that we may want additional information about the update driver could be wrong, so I'm happy to simplify it to a 2-tuple since we only have two pieces of information for now if you think that's cleaner :)

@@ -1081,6 +1093,13 @@ rpmostree_builtin_status (int argc,
else
cached_update_node = json_node_new (JSON_NODE_NULL);
json_builder_add_value (builder, cached_update_node);
json_builder_set_member_name (builder, "update-driver");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems like the JSON is missing the update driver ID, no?

But actually, it's even easier than that: you can just pass the deserialized GVariant object directly like we do for transaction above.

Copy link
Member Author

@kelvinfan001 kelvinfan001 Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I left that out because the update driver ID could potentially be empty if the transaction wasn't called with e.g. RPMOSTREE_CLIENT_ID=testing-agent-id. But on second thought, I think if we're calling the --register-driver option, we should make sure the caller supplies the update driver ID, and return an error if not, right?

Copy link
Member

@cgwalters cgwalters Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably. However...we should debate a little whether the "id" should be the central thing or whether the systemd unit should be.

Probably the axis here is around the MCO versus not - when talking to rpm-ostree from a Kubernetes pod you just get a not-human-readable auto-generated cgroup slice. But when doing it from a host systemd service you get a nice e.g. zincati.service. (It's a little less nice in the case of gnome-software but it's still not bad, at least it includes the software name) e.g.:

[root@quicksilver ~]# journalctl -u rpm-ostreed|grep client.*software | head -1
Jan 11 13:53:41 quicksilver rpm-ostree[2691]: client(id:gnome-software dbus:1.110 unit:app-gnome-gnome\x2dsoftware\x2dservice-2334.scope uid:1000) added; new total=1
[root@quicksilver ~]#

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an interesting parallel to this debate which is: identifying software from the dpkg/RPM package name versus the systemd unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the middle ground is: if the systemd unit ends in .service, use it. Otherwise, use the client ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I definitely think we should teach the MCO to use this.

Agree that we should enforce id being provided with --register-driver and display it prominently. The systemd unit can stay just in the journal logs for now. (Or perhaps conditionally display it with the -v i.e. opt_verbose option)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing worth noting is that RPMOSTREE_CLIENT_ID today only has an effect if the driver is using the CLI, and only if the CLI decides to register via RegisterClient. AFAICT, this is currently not the case for Zincati because it runs as a separate UID which systemd doesn't consider active. E.g. from my FCOS server:

Jan 19 12:09:09 obelix rpm-ostree[2605951]: Initiated txn Deploy for client(dbus:1.5847 unit:zincati.service uid:980): /org/projectatomic/rpmostree1/fedora_coreos

(Notice how it doesn't have an id:....)

So we'll have to tweak this. Also worth noting that the env var is just an override to the default cli, which isn't very useful (but would still pass through the --register-driver check of requiring an ID). And at the D-Bus level, providing a client ID means the client has to call RegisterClient, which so far hasn't been necessary (though it's clearly not a big deal to require it).

Hmm... I wonder if we should just leave the client ID as a thing we log to help debugging but otherwise not really promote it to the status and instead have register-driver take a human-readable string like "Zincati", or "Machine Config Operator". Something meaningful that users can just copy/paste into their search engine (which IMO is one of the primary goals of #1747). Then it's also obvious that it's required and it makes the API cleaner, because the option itself takes a value. (And maybe then we can have a special <systemd_unit_description> value or something which just tells rpm-ostree to use the Description of the associated systemd unit).

Anyway, don't mean to derail this, but worth discussing. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see. Yeah I haven't actually tested this with Zincati so I wasn't aware of that. In that case I agree we should make --register-driver take in an argument - a human readable string.

Copy link
Member Author

@kelvinfan001 kelvinfan001 Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting:

Also worth noting that the env var is just an override to the default cli

I tried playing around in the CLI with my builds of rpm-ostree and it seems that if I don't provide the env var, the ID will indeed be NULL.

Edit: never mind, found the reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wow, I guess coreos/zincati@c214f3e wasn't tested? I had thought that was working.

Mmmm...OK I guess we can just add a string to the driver.

@kelvinfan001 kelvinfan001 changed the title Allow Update Driver Registration [WIP] Allow Update Driver Registration Jan 21, 2021
@kelvinfan001 kelvinfan001 changed the title [WIP] Allow Update Driver Registration Allow Update Driver Registration Jan 21, 2021
@kelvinfan001
Copy link
Member Author

register-driver now takes a human-readable string as an argument instead of reading from the transaction's agent_id. agent_id is no longer recorded in the update driver state file or displayed in status.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
All looking good now to me, just a few things that we can do as a followup commit!

@@ -49,6 +50,7 @@ static GOptionEntry option_entries[] = {
{ "lock-finalization", 0, G_OPTION_FLAG_HIDDEN, G_OPTION_ARG_NONE, &opt_lock_finalization, "Prevent automatic deployment finalization on shutdown", NULL },
{ "disallow-downgrade", 0, 0, G_OPTION_ARG_NONE, &opt_disallow_downgrade, "Forbid deployment of chronologically older trees", NULL },
{ "unchanged-exit-77", 0, 0, G_OPTION_ARG_NONE, &opt_unchanged_exit_77, "If no new deployment made, exit 77", NULL },
{ "register-driver", 0, G_OPTION_FLAG_HIDDEN, G_OPTION_ARG_STRING, &opt_register_driver, "Register the calling agent as the driver for updates. Takes a human-readable string as name for driver", "DRIVERNAME" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should this really be G_OPTION_FLAG_HIDDEN? I don't see a problem making this officially stable.

Comment on lines 813 to 817
{
g_set_error (error, RPM_OSTREED_ERROR,
RPM_OSTREED_ERROR_FAILED, "update driver name not provided");
return FALSE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{
g_set_error (error, RPM_OSTREED_ERROR,
RPM_OSTREED_ERROR_FAILED, "update driver name not provided");
return FALSE;
}
return glnx_throw (error, "update driver name not provided");

return FALSE;

*driver_g_variant =
g_variant_ref_sink (g_variant_new_from_bytes (G_VARIANT_TYPE_VARDICT, data, FALSE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be a bit more idiomatic to have the return value here be GVariant* and NULL on error.

@cgwalters
Copy link
Member

cgwalters commented Jan 22, 2021

Hmm this looks like a real bug:

Jan 22 00:22:57 qemu0 rpm-ostree[1539]: g_atomic_ref_count_dec: assertion 'g_atomic_int_get (arc) > 0' failed

Jan 22 00:22:57 qemu0 kola-runext-layering-fedorainfra[1300]: /usr/local/bin/kola-runext-layering-fedorainfra: line 12:  1539 Trace/breakpoint trap   (core dumped) rpm-ostree status > status.txt

To debug this try doing e.g.:

$ kola run --qemu-image fastbuild-fedora-coreos-rpm-ostree-qemu.qcow2 ext.rpm-ostree.destructive.layering-fedorainfra --ssh-on-test-failure

And inside the VM use e.g. coredumpctl dump <pid> and journalctl -b -u rpm-ostreed to get more info.

@kelvinfan001 kelvinfan001 force-pushed the kfan-register-update-driver branch 2 times, most recently from e0b61f6 to bf20179 Compare January 22, 2021 16:47
@kelvinfan001
Copy link
Member Author

@cgwalters sorry for dragging this one out for so long. I've addressed the feedback from your latest review and CI is passing now.

@cgwalters
Copy link
Member

cgwalters commented Jan 22, 2021

Nice!
The new code was just missing one error handling path; to avoid a roundtrip I updated it (and also while there tweaked the variable name to be a bit clearer).
Thanks a lot for working on this, it will be very helpful for admins!
/lgtm

@cgwalters
Copy link
Member

/lgtm cancel

@cgwalters
Copy link
Member

cgwalters commented Jan 22, 2021

We had a realtime discussion about this, turns out I think I suggested the wrong thing in review; since the driver info is optional we need to make it be:

gboolean get_driver_info (GVariant **out_info, GError**error);

which is basically simulating Result<Option<T>> in Rust.

In C++/Python/Java that use use implicit exceptions over reifed errors, the "happy path" looks nice since it's just GVariant *get_driver_info () but the huge debate in those communities is around things like "checked exceptions" etc. The "implicit" nature of exceptions makes it not particularly obvious which functions can actually error out, etc.

Go has reified errors but there you just kind of get tired of typing if err != nil { return err }.

I think Rust really did this correctly by reifying errors but making them easy to handle with ?.

Record the calling agent's systemd unit and serialize it into a
g-variant file at `/run/rpmostree/update-driver.gv`, along with the
human-readable name of the update driver provided as a string
argument.

Also add the companion `--register-driver` option to the `deploy`
CLI argument.

Closes coreos#1747.
Read from `/run/rpm-ostree/update-driver.gv` and display the update
driver name (and systemd unit if verbose).
@kelvinfan001
Copy link
Member Author

kelvinfan001 commented Jan 24, 2021

/lgtm

edit: looks like I can’t lgtm my own PR (which makes total sense). @cgwalters might need to bother you again 😄
edit 2: hmm but I can manually add the label myself...

@openshift-ci-robot
Copy link
Collaborator

@kelvinfan001: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kelvinfan001

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strengthen notion of "update driver" in rpm-ostree status
5 participants