Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error failed to register as driver: rpm-ostree deploy –register-driver failed #588

Closed
empikls opened this issue Jun 24, 2021 · 10 comments · Fixed by #599
Closed

Error failed to register as driver: rpm-ostree deploy –register-driver failed #588

empikls opened this issue Jun 24, 2021 · 10 comments · Fixed by #599
Labels

Comments

@empikls
Copy link

empikls commented Jun 24, 2021

Hi,

I have problem when i start my service zincati in first time.
My instances are managed by terraform and when i check status of service i see error

[ERROR] failed to register as driver: rpm-ostree deploy --register-driver failed:
error: Transaction in progress: (null)
You can cancel the current transaction with rpm-ostree cancel

I can restart my service and error goes away, but i can’t do it manually on all my instances and terraform don’t allow restart services automatically .
Also when i first check rpm-ostree status -b i don’t see Zincati in front of BootedDeployment , but later after restart i see the following

AutomaticUpdatesDriver: Zincati
DriverState: active; periodically polling for updates (last checked Thu 2021-06-24 11:56:10 UTC)

Please help me with my case .

@lucab
Copy link
Contributor

lucab commented Jun 24, 2021

Thanks for the report. It looks like there was something else possibly interacting with rpm-ostree at the same time, which is however a bit unexpected. At the same, this should be a transient condition and the service unit should eventually restart and succeed without any manual intervention.
It would be interested to see rpm-ostreed.service and zincati.service journal logs to see how they are behaving.

/cc @kelvinfan001

@kelvinfan001 kelvinfan001 added the jira for syncing to jira label Jun 24, 2021
@empikls
Copy link
Author

empikls commented Jun 25, 2021

@lucab @kelvinfan001
Yes, sure. This is journal logs of rpm-ostreed.service :

Jun 25 06:06:55  rpm-ostree[6831]: Reading config file '/etc/rpm-ostreed.conf'
Jun 25 06:06:57  systemd[1]: Started rpm-ostree System Management Daemon.
Subject: A start job for unit rpm-ostreed.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit rpm-ostreed.service has finished successfully.
░░ 
░░ The job identifier is 1357.
Jun 25 06:06:57  rpm-ostree[6831]: In idle state; will auto-exit in 63 seconds
Jun 25 06:08:01  systemd[1]: rpm-ostreed.service: Deactivated successfully.
Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ The unit rpm-ostreed.service has successfully entered the 'dead' state.

Jun 25 06:08:01  systemd[1]: rpm-ostreed.service: Consumed 2.561s CPU time.
Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ The unit rpm-ostreed.service completed and consumed the indicated resources.

and journal logs of zincati.service :

Jun 25 06:01:50  zincati[1079]: [INFO ] starting update agent (zincati 0.0.21)
Jun 25 06:01:52  zincati[1079]: [INFO ] Cincinnati service: https://updates.coreos.fedoraproject.org
Jun 25 06:01:52  zincati[1079]: [INFO ] agent running on node '6a2e898939b4489eaf78c370e9aa1ddd', in update group 'default'
Jun 25 06:01:52  zincati[1079]: [INFO ] registering as the update driver for rpm-ostree
Jun 25 06:01:53  zincati[1079]: [ERROR] failed to register as driver: rpm-ostree deploy --register-driver failed:
Jun 25 06:01:53  zincati[1079]:     error: Transaction in progress: (null)
Jun 25 06:01:53  zincati[1079]:      You can cancel the current transaction with `rpm-ostree cancel`
Jun 25 06:01:53  zincati[1079]:     
Jun 25 06:01:53  zincati[1079]: [INFO ] initialization complete, auto-updates logic enabled
Jun 25 06:01:53  zincati[1079]: [INFO ] update strategy: periodic, total schedule length 30 minutes; next window at 23:30 on Sat>
Jun 25 06:01:53  systemd[1]: Started Zincati Update Agent.
░░ Subject: A start job for unit zincati.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit zincati.service has finished successfully.
░░ 
░░ The job identifier is 435.
Jun 25 06:01:54 zincati[1079]: [INFO ] current release detected as not a dead-end

Before restart zincati.service

$ rpm-ostree status -b
State: idle
BootedDeployment:
● ostree://fedora:fedora/x86_64/coreos/next
                   Version: 34.20210611.1.0 (2021-06-14T14:54:15Z)
                    Commit: 72ca343cead......
              GPGSignature: Valid signature by 8C5BA6990BDB......

After restart zincati.service

$ rpm-ostree status -b
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Fri 2021-06-25 07:27:54 UTC)
BootedDeployment:
● ostree://fedora:fedora/x86_64/coreos/next
                   Version: 34.20210611.1.0 (2021-06-14T14:54:15Z)
                    Commit: 72ca343ceadd....
              GPGSignature: Valid signature by 8C5BA6990BD.....

@kelvinfan001
Copy link
Member

kelvinfan001 commented Jun 25, 2021

At the same, this should be a transient condition and the service unit should eventually restart and succeed without any manual intervention.

To provide a little bit more clarification to this, zincati.service will not "restart" following a failed registration attempt. As of currently, Zincati will simply give up attempting to register as the updates driver for rpm-ostree after the first failed attempt at registration.

However, Zincati will continue on with the rest of its initialization. Importantly, all of Zincati's functionality (e.g. around checking for and applying updates when possible) will still work as normal.

For more context, registering as the "updates driver" in rpm-ostree is mostly a UX improvement; see coreos/rpm-ostree#2536. So far, the only difference that it makes is that, as you've noted, rpm-ostree status won't display Zincati as driving updates, and commands like rpm-ostree upgrade (not recommended on FCOS) will be allowed by default without having to --bypass-driver.

All that said, this is indeed a bug and I'll dig into it. Thanks!

@empikls
Copy link
Author

empikls commented Jun 25, 2021

@kelvinfan001 Thanks for your involvement! I will look forward to further investigation of this issue. I also want to add that I have applied the Periodic strategy update, and I will see how the zincati service will behaves. Currently, I have two types of instances - without applying the deployment strategy and with, both of them have a similar error - rpm-ostree status won't display Zincati as driving updates. So I will check in my side soon how it would work with deployment strategy ( this Sunday ) .

@kelvinfan001
Copy link
Member

kelvinfan001 commented Jul 6, 2021

It looks like there was something else possibly interacting with rpm-ostree at the same time,

@empikls do you think you can provide some earlier logs of rpm-ostree, preferably around the time when zincati.service's intialization was starting? i.e. it would be helpful to see what transaction rpm-ostree was handling at around the time of Jun 25 06:01:52 (when Zincati tried to create another transaction to register as driver).

The (null) transaction that is supposedly blocking Zincati's attempt to create a deploy transaction (to register as driver) looks mysterious to me. Digging into rpm-ostree's code a little bit, I couldn't find any reason for a transaction to be null, so I've filed that bug to rpm-ostree here.

On Zincati's side, we could probably get Zincati to retry a couple of times to register as driver before giving up, but I'm not sure if this is the best solution to this problem, depending on what kind of transaction rpm-ostree was handling at the time. Another issue with retrying multiple times is that the amount of time for rpm-ostree to process each transaction varies greatly; I'm not sure if it is worth it to block Zincati initialization on driver registration. It also might be helpful if rpm-ostree had the ability to queue transactions.
@lucab what do you think?

@empikls
Copy link
Author

empikls commented Jul 7, 2021

@kelvinfan001 for the experiment, I set up a new instance, here are the zincati.service logs from the very start :

-- Journal begins at Wed 2021-07-07 07:09:10 UTC, ends at Wed 2021-07-07 07:27:46 UTC. --
Jul 07 07:09:24 zincati[1077]: [INFO ] starting update agent (zincati 0.0.21)
Jul 07 07:09:26 zincati[1077]: [INFO ] Cincinnati service: https://updates.coreos.fedoraproject.org
Jul 07 07:09:26 zincati[1077]: [INFO ] agent running on node '0277c06444bf43bc857f53a05dc73498', in update group 'default'
Jul 07 07:09:26 zincati[1077]: [INFO ] registering as the update driver for rpm-ostree
Jul 07 07:09:27 zincati[1077]: [ERROR] failed to register as driver: rpm-ostree deploy --register-driver failed:
Jul 07 07:09:27 zincati[1077]:     error: Transaction in progress: (null)
Jul 07 07:09:27 zincati[1077]:      You can cancel the current transaction with `rpm-ostree cancel`
Jul 07 07:09:27 zincati[1077]:     
Jul 07 07:09:28 zincati[1077]: [INFO ] initialization complete, auto-updates logic enabled
Jul 07 07:09:28 zincati[1077]: [INFO ] update strategy: periodic, total schedule length 30 minutes; next window at 23:30 on Sat (Europe/Kiev), subject to >
Jul 07 07:09:28 systemd[1]: Started Zincati Update Agent.
░░ Subject: A start job for unit zincati.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit zincati.service has finished successfully.
░░ 
░░ The job identifier is 437.
Jul 07 07:09:28 zincati[1077]: [INFO ] current release detected as not a dead-end
Jul 07 07:09:29 zincati[1077]: [INFO ] target release '34.20210626.1.0' selected, proceeding to stage it
Jul 07 07:09:29 zincati[1077]: [ERROR] failed to stage deployment: rpm-ostree deploy failed:
Jul 07 07:09:29 zincati[1077]:     error: Transaction in progress: kargs --append 'mitigations=auto tsx=auto'
Jul 07 07:09:29 zincati[1077]:      You can cancel the current transaction with `rpm-ostree cancel`
Jul 07 07:09:29 zincati[1077]
Jul 07 07:14:53 zincati[1077]: [INFO ] target release '34.20210626.1.0' selected, proceeding to stage it

And when I check rpm-ostree status -b :

State: idle
BootedDeployment:
● ostree://fedora:fedora/x86_64/coreos/next
                   Version: 34.20210611.1.0 (2021-06-14T14:54:15Z)
                    Commit: 72ca343ceadd3734af6.....
              GPGSignature: Valid signature by 8C5BA6990B....

And also i can add logs from rpm-ostreed.service :

 -- Journal begins at Wed 2021-07-07 07:09:10 UTC, ends at Wed 2021-07-07 07:19:44 UTC. --
Jul 07 07:09:24 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 07:09:25 rpm-ostree[1073]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 07:09:26 rpm-ostree[1073]: In idle state; will auto-exit in 64 seconds
Jul 07 07:09:26 systemd[1]: Started rpm-ostree System Management Daemon.
Jul 07 07:09:26 rpm-ostree[1073]: client(id:cli dbus:1.3 unit:enable-smt-subsequentboot.service uid:0) added; new total=1
Jul 07 07:09:27 rpm-ostree[1073]: Locked sysroot
Jul 07 07:09:27 rpm-ostree[1073]: Initiated txn KernelArgs for client(id:cli dbus:1.3 unit:enable-smt-subsequentboot.service uid:0): /org/projectatomic/rp>
Jul 07 07:09:27 rpm-ostree[1073]: Process [pid: 1024 uid: 0 unit: enable-smt-subsequentboot.service] connected to transaction progress
Jul 07 07:09:35 rpm-ostree[1073]: Created new deployment /ostree/deploy/fedora-coreos/deploy/72ca343ceadd3734af669b08ff4d00d3c122254cb4de25a91e71f4939fd35>
Jul 07 07:09:35 rpm-ostree[1073]: sanitycheck(/usr/bin/true) successful
Jul 07 07:09:36 rpm-ostree[1073]: Txn KernelArgs on /org/projectatomic/rpmostree1/fedora_coreos successful
Jul 07 07:09:40 rpm-ostree[1073]: Unlocked sysroot
Jul 07 07:09:40 rpm-ostree[1073]: Process [pid: 1024 uid: 0 unit: enable-smt-subsequentboot.service] disconnected from transaction progress
Jul 07 07:09:41 rpm-ostree[1073]: client(id:cli dbus:1.3 unit:enable-smt-subsequentboot.service uid:0) vanished; remaining=0
Jul 07 07:09:41 rpm-ostree[1073]: In idle state; will auto-exit in 61 seconds
Jul 07 07:10:43 systemd[1]: rpm-ostreed.service: Deactivated successfully.
Jul 07 07:10:43 systemd[1]: rpm-ostreed.service: Consumed 11.969s CPU time.
Jul 07 07:11:53 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 07:11:53 rpm-ostree[6803]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 07:11:55 rpm-ostree[6803]: In idle state; will auto-exit in 62 seconds
Jul 07 07:11:55 systemd[1]: Started rpm-ostree System Management Daemon.
Jul 07 07:11:55 rpm-ostree[6803]: Allowing active client :1.64 (uid 1000)
Jul 07 07:11:55 rpm-ostree[6803]: client(id:cli dbus:1.64 unit:session-2.scope uid:1000) added; new total=1
Jul 07 07:11:56 rpm-ostree[6803]: client(id:cli dbus:1.64 unit:session-2.scope uid:1000) vanished; remaining=0
Jul 07 07:11:56 rpm-ostree[6803]: In idle state; will auto-exit in 64 seconds
Jul 07 07:13:01 systemd[1]: rpm-ostreed.service: Deactivated successfully.
Jul 07 07:13:01 systemd[1]: rpm-ostreed.service: Consumed 2.382s CPU time.
Jul 07 07:14:53 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 07:14:54 rpm-ostree[8060]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 07:14:55 rpm-ostree[8060]: In idle state; will auto-exit in 63 seconds

@lucab
Copy link
Contributor

lucab commented Jul 7, 2021

@empikls it looks like there are some other tasks running on this machine at boot, like settings kargs through enable-smt-subsequentboot.service. Are those part of the node Butane/Ignition configuration? If so, can you please share your whole node configuration (after redacting secrets) so that we can try to reproduce this?

In general if there are some other services interacting with rpm-ostree daemon (and possibly rebooting the machine), the units should have proper Before/After dependencies to make sure they run in a deterministic order.

@empikls
Copy link
Author

empikls commented Jul 7, 2021

@lucab yes, enable-smt-subsequentboot.service is a part of the node Ignition configuration .
The whole node configuration look like this :

# Ignition config
data "ignition_config" "user_data" {
  users       = [for user in data.ignition_user.users : user.rendered]
  groups      = [for group in data.ignition_group.groups : group.rendered]
  filesystems = [for volume in data.ignition_filesystem.volumes : volume.rendered]
  directories = [for dir in data.ignition_directory.dirs : dir.rendered]
  systemd = concat(
    [
      data.ignition_systemd_unit.metadata.rendered,
      data.ignition_systemd_unit.rpc_statd.rendered,
      data.ignition_systemd_unit.zincati.rendered,
      data.ignition_systemd_unit.docker_dropins.rendered,
      data.ignition_systemd_unit.enable_smt_firstboot.rendered,
      data.ignition_systemd_unit.enable_smt_subsequentboot.rendered,
    ],
    [for volume in data.ignition_systemd_unit.volumes : volume.rendered],
  )
  files = concat(
    [
      data.ignition_file.zincati_strategy.rendered,
      data.ignition_file.sysctl_net.rendered,
      data.ignition_file.modprobe_net.rendered,
      data.ignition_file.sysconfig_docker.rendered,
      data.ignition_file.selinux.rendered,
      data.ignition_file.ebs_udev.rendered,
      data.ignition_file.ebs_nvme.rendered,
      
    ],
    [for tmpfile in data.ignition_file.tmpfiles : tmpfile.rendered]
  )
}

Ignition for services:

data "ignition_systemd_unit" "metadata" {
  name    = "afterburn.service"
  enabled = true
}

data "ignition_systemd_unit" "rpc_statd" {
  name    = "rpc-statd.service"
  enabled = true
}

data "ignition_systemd_unit" "zincati" {
  name    = "zincati.service"
  enabled = false
}


data "ignition_systemd_unit" "docker_dropins" {
  name = "docker.service"
  dropin {
    name    = "increase-ulimit.conf"
    content = <<EOF
[Unit]
After=${join(" ", [for volume in data.ignition_systemd_unit.volumes : volume.name])}
Requires=${join(" ", [for volume in data.ignition_systemd_unit.volumes : volume.name])}

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd \
          --host=fd:// \
          $OPTIONS
LimitMEMLOCK=infinity
EOF
  }
}

data "ignition_systemd_unit" "enable_smt_firstboot" {
  name    = "enable-smt-firstboot.service"
  enabled = true
  content = <<EOF
[Unit]
DefaultDependencies=no
Before=sysinit.target shutdown.target
Conflicts=shutdown.target
ConditionFirstBoot=true

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'active="$(cat /sys/devices/system/cpu/smt/active)" && if [[ "$active" != 1 ]] && grep -q "vendor_id.*GenuineIntel" /proc/cpuinfo; then echo "Enabling SMT." && echo on > /sys/devices/system/cpu/smt/control; fi'

[Install]
WantedBy=sysinit.target
EOF
}

data "ignition_systemd_unit" "enable_smt_subsequentboot" {
  name    = "enable-smt-subsequentboot.service"
  enabled = true
  content = <<EOF
[Unit]
Description=Enable SMT on subsequent boots by changing kernel arguments
ConditionFirstBoot=true

[Service]
Type=oneshot
ExecStart=/usr/bin/rpm-ostree kargs --append "mitigations=auto tsx=auto"

[Install]
WantedBy=multi-user.target
EOF
}

@kelvinfan001
Copy link
Member

kelvinfan001 commented Jul 8, 2021

@empikls I have been able to reproduce this problem. Looking at the Ignition configuration you provided, I believe the conflict is caused by enable-smt-subsequentboot.service. In the short term, a workaround would be to order this unit around zincati.service, for example, you can add a Before=boot-complete.target dependency to enable-smt-subsequentboot.service.

@empikls
Copy link
Author

empikls commented Jul 8, 2021

@kelvinfan001 Great! I've set up instances with Before=boot-complete.target dependency in enable-smt-subsequentboot.service and error was gone for zincati.service . Also rpm-ostree status display Zincati as driving updates . I checked three times and got a positive result .
Thanks a lot for investigation and support!

kelvinfan001 added a commit to kelvinfan001/fedora-coreos-docs that referenced this issue Jul 8, 2021
For units that require creating rpm-ostree transactions, there is
a risk of failing if there are other rpm-ostree transactions
taking place at the same time. Order these karg services before
`boot-complete.service` to ensure that we run early enough so that
other services (e.g. `zincati.service`) or user commands won't
cause this unit to fail.

This was noticed in coreos/zincati#588.
kelvinfan001 added a commit to kelvinfan001/fedora-coreos-docs that referenced this issue Jul 8, 2021
For units that require creating rpm-ostree transactions, there is
a risk of failing if there are other rpm-ostree transactions
taking place at the same time. Order these karg services before
`boot-complete.target` to ensure that we run early enough so that
other services (e.g. `zincati.service`) or user commands won't
cause this unit to fail.

This was noticed in coreos/zincati#588.
@kelvinfan001 kelvinfan001 linked a pull request Jul 9, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants