Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flesh out and document target UX #8

Open
jlebon opened this issue Sep 8, 2020 · 9 comments
Open

Flesh out and document target UX #8

jlebon opened this issue Sep 8, 2020 · 9 comments

Comments

@jlebon
Copy link
Member

jlebon commented Sep 8, 2020

I think it's a useful exercise to early on flesh out what the UX will look like. Let's discuss that here and then add something in the README?

Some bootstrapping questions:

  • how does one perform a manual upgrade?
  • how do automated systems perform upgrades?
  • how does this tie into rpm-ostree status on rpm-ostree-based systems?

We don't need to answer everything completely, but discussing these will make it easier to think about how bootupd fits in.

@cgwalters
Copy link
Member

cgwalters commented Sep 8, 2020

how does one perform a manual upgrade?
how do automated systems perform upgrades?

The questions presuppose that upgrades aren't on by default which...hasn't been decided I'd say. It might be that e.g. FCOS ships with bootupd on by default.

how does this tie into rpm-ostree status on rpm-ostree-based systems?

Should it? Should rpm-ostree status include e.g. any status from dbxtool.service too? I like the idea of a "one pane of glass" but it also introduces some potential confusion if admins start to think they're actually linked.

@cgwalters
Copy link
Member

I'm currently leaning towards having no updates by default and also documenting how one can use a container to orchestrate bootupd.

@jlebon
Copy link
Member Author

jlebon commented Sep 23, 2020

WDYT about the discussions in coreos/fedora-coreos-tracker#510 (comment)? It seems like for EFI at least, it seems possible to make updates quite safe. In which case, it might be worthwhile to just always update to simplify the model and maintenance. (Obviously doesn't help BIOS of course).

@jlebon
Copy link
Member Author

jlebon commented Sep 23, 2020

Should it? Should rpm-ostree status include e.g. any status from dbxtool.service too? I like the idea of a "one pane of glass" but it also introduces some potential confusion if admins start to think they're actually linked.

I think it would be really useful for rpm-ostree status -v to print the OSTree commit and version from which the currently installed bootloader comes from. I could take a look at that assuming that information is currently being stored in /boot by bootupd.

Edit: thinking more on this... on the fence as well I think. I think it makes sense to have it in status -v if we implement "always update" via rpm-ostree calling out to bootupd.

@cgwalters
Copy link
Member

OK so clearly we want things to be configurable. There's the question of the default, but I think what we basically want is to support:

  • Support not enabling any updates by default (status quo)
  • Update by default when ostree updates (systemctl enable bootupd-automatic.service e.g.) - basically when we boot into the new ostree we update the bootloader. Now clearly it would be (potentially) better to update the bootloader before rebooting, i.e. hook into the rpm-ostree process and scrape out the updates but...eh. The thing is one doesn't really need to reboot after updating the bootloader other than to test it .
  • Be scriptable by an external agent (machine-config-operator, gnome-software); we have bootupctl status --json and then one can use bootupctl update e.g.

So for FCOS I think my vote would be off by default, it's trivial today to enable a systemd unit via Ignition/fcct so if we ship bootupd-automatic.service that should be fine.

Now a general concern here is people running clusters will want to avoid the possibility of bricking multiple servers at once. For the MCO case that should already happen anyways if we enable bootupd-automatic.service as part of OKD/OCP - the MCO's role here might just be notifying/logging that the bootloader was updated?

That said there is still the overall concern that ostree updates are transactional, bootloader updates aren't - some admins may want to schedule the latter separately and be prepared for recovery in the (unlikely but possible) event things go wrong.

@lucab
Copy link
Contributor

lucab commented Oct 14, 2020

Sorry for the late feedback, I also have some doubts on the UX, especially regarding auto-updates.

Now clearly it would be (potentially) better to update the bootloader before rebooting, i.e. hook into the rpm-ostree process and scrape out the updates but...eh.

This would be a sweet-spot in terms of tackling updates cluster-wide, because otherwise a bootloader update requires two reboots: the first to have the ostree content available and the second to actually use the new bootloader.

The thing is one doesn't really need to reboot after updating the bootloader other than to test it.

From ContainerLinux experience, this a ticking bomb with a deferred explosion triggered by any reboot, which is better to avoid. It's problematic because an external unplanned event (kernel crash, power glitch, VM restart, etc) may activate an update at the worst possible time, possible compounding on other troubles and making root-cause analysis way messier. The current rpm-ostree approach of locked finalization (i.e. with a final apply&reboot atomic action) is a better model.

@jlebon
Copy link
Member Author

jlebon commented Oct 20, 2020

Now clearly it would be (potentially) better to update the bootloader before rebooting, i.e. hook into the rpm-ostree process and scrape out the updates but...eh.

This would be a sweet-spot in terms of tackling updates cluster-wide, because otherwise a bootloader update requires two reboots: the first to have the ostree content available and the second to actually use the new bootloader.

Agreed. Another reason is that doing it pre-reboot you find out immediately if the bootloader update breaks your boot and so the rollout stops on the first machine instead of bricking your whole cluster.

That said, I don't want to go back to coreos/rpm-ostree#1882. I'd much prefer for tighter integration between rpm-ostree and bootupd, which I think then meshes well with having it in status -v as mentioned above? The update policy itself could still live out-of-band though; e.g. rpm-ostree would just tell bootupd "hey I just deployed this pending commit, do what you will".

This does go counter though to the "offline background updates" story. But there is only one bootloader, so there can never really be "offline updates" in the same way (though see #8 (comment)).

@jlebon
Copy link
Member Author

jlebon commented Oct 20, 2020

That said, I don't want to go back to coreos/rpm-ostree#1882. I'd much prefer for tighter integration between rpm-ostree and bootupd, which I think then meshes well with having it in status -v as mentioned above? The update policy itself could still live out-of-band though; e.g. rpm-ostree would just tell bootupd "hey I just deployed this pending commit, do what you will".

Hmm actually, maybe a more correct way to do this is to integrate at the finalization stage just like ostree-finalize-staged.service does. We can make the ostree API to hook into that more official and have bootupd fire during finalization? That way, we're also sure that we're updating /boot with the bits we're rebooting into. And it feels more "background updates"-ish than doing it at staging time because rebooting into a new update is a bit like permission to mutate state. It also addresses better @lucab's concerns re. locking in #8 (comment) too.

@cgwalters
Copy link
Member

In practice today I think two things are true:

  • We will ship bootloader updates rarely
  • Bootloader updates are unlikely to break things

Given this, a simple systemd unit like this:

[Unit]
Description=Bootupd automatic update
[Service]
ExecStart=/usr/bin/bootupctl update
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

is going to be fine for many people to start, or they could just do it manually.

Now, I do agree with the concerns above. I filed that as #108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants