Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds current mkosi PID to qemu JSON state file name #2945

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gdonval
Copy link

@gdonval gdonval commented Aug 6, 2024

When mkosi crashes, Python does not always clean qemu state file. This patch appends current mkosi's PID to the name, solving this problem and allowing other instances of mkosi to use the same machine name without being confused.

When mkosi crashes, Python does not always clean qemu state file.
This patch appends current mkosi PID to the name, solving this
problem and allowing other instances of mkosi to use the same
machine name without being confused.
Copy link
Contributor

@behrmann behrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this makes sense. It completely sidesteps the locking, since every filename is different (disregarding PID reuse), so the locking seems a bit pointless?

What kind of crashes sidestep the finally?

with flock(INVOKING_USER.runtime_dir() / "machine"):
if (p := INVOKING_USER.runtime_dir() / "machine" / f"{config.machine_or_name()}.json").exists():
with flock(machine_folder):
if (p := machine_folder / f"{config.machine_or_name()}-{mkosi_pid:d}.json").exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :d is not necessary, numbers are by default formatted as decimal numbers.

Suggested change
if (p := machine_folder / f"{config.machine_or_name()}-{mkosi_pid:d}.json").exists():
if (p := machine_folder / f"{config.machine_or_name()}-{mkosi_pid}.json").exists():

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking is done on the machine folder (typically /run/user/$uid/mkosi/machine). This PR does not change that.

Is there any reason no two mkosi instances should not use the same machine name?
If the only reason is that it's later used by mkosi to ssh into the VM, then the current approach is fine.


What kind of crashes sidestep the finally?

I'm not sure if it's just ctrl+C at the wrong time or if it's just when I try booting an image that crashes or if I quit the terminal without doing ctrl+C but I can't seem to just be able to mkosi qemu for a full day without having to manually clean the machine state directory. It's genuinely happening a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdonval Are you on the latest commit on main? We recently added logic to handle SIGHUP, that could maybe fix your issue.

Copy link
Author

@gdonval gdonval Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm on 24.3.

At any rate, I'm a bit confused... Is the if ...: die() here a simple way to prevent spurious overwrites/deletions or is there any value in making sure machine names are unique across the machine?

If it's the first one, I think my approach is valid. If it's the second one, maybe let's just not use hostname as default machine name value?

I insist on "default" because I don't really want to start seeing doc saying mkosi --machine=$(uuidgen) qemu, just to make sure things work consistently, whether you happen to choose the same hostname in two mkosi calls or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any rate, I'm a bit confused... Is the if ...: die() here a simple way to prevent spurious overwrites/deletions or is there any value in making sure machine names are unique across the machine?

The SIGHUP handling we have on git main will likely reduce the number of times you run into this as the files in /run/ will be properly cleaned up when mkosi qemu crashes.

If it's the first one, I think my approach is valid. If it's the second one, maybe let's just not use hostname as default machine name value?

@gdonval The machine name is used to identify the target for mkosi ssh so making it random would make that harder. We can probably pick a better name than default though and allow disabling "registering" the machine.

Copy link
Author

@gdonval gdonval Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for SIGHUP.

@DaanDeMeyer, so

  1. The machine name is not a lock, therefore its existence should not make new instances die.
  2. That's a mjosi ssh concern, therefore it should not affect new mkosi qemu calls.

I suggest:

  1. either don't write a state file, unless --machine is specified;
  2. or write machine named with a counter appended (similar to this PR) so that mkosi qemu doesn't crash.

In the latter case, qemu ssh can look for a precise machine name (with counter) or could look at the highest counter for some specific machine name stem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdonval I prefer mkosi ssh to work out of the box without having to specify --machine. What about using the working directory name as the machine name instead of default? That should drastically reduce the number of conflicts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanDeMeyer, you prefer

mkosi ssh to work without having to specify machine

and suggest using the folder name.

Am I right to assume you would call mkosi ssh only from the current mkosi folder then?

In that case, why don't we write a softlink reference somewhere appropriate in the current mkosi project folder? State would still be written centrall (albeit with a convenient distinguisher) and mkosi ssh would follow the local link to that folder's default session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants