Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the *before_notebook* 20_start-postgresql.sh and 40_prepare_aiida.sh #476

Merged
merged 8 commits into from
Jul 4, 2024

Conversation

mikibonacci
Copy link
Member

Fixings are done as for arm64 I encountered issues in restarting the container.

1 - Deleting old ${PSQL_LOGFILE}.1.gz file before trying to compressed again in 20_start-postgresql.sh. Otherwise, this can give errors and so the container will exit with non-zero status.
2 - Clean stop of the daemon before the "verdi storage migrate --force" in 40_prepare_aiida.sh. This is done because sometimes the restart of the docker container will give error in trying to migrate, as the daemon is still running. I am not able to reproduce the error in a standard way, however.

for arm64 I encountered the issues in restarting the container.

1- Deleting old ${PSQL_LOGFILE}.1.gz file before trying to compressed again. Otherwise, this can give errors and so the container will exit with non-zero status.
2- Clean stop of the daemon before the "verdi storage migrate --force" in `40_prepare_aiida.sh`. This is done because sometimes the restart of the docker container will give error in trying to migrate,
as the daemon is still running. I am not able to reproduce the error in a standard way, however.
@@ -74,6 +74,8 @@ load_computer('${computer_name}').set_minimum_job_poll_interval(${job_poll_inter
else

# Migration will run for the default profile.
## We need to stop the daemon before.
verdi daemon stop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to understand why this is needed, since the daemon cannot be running at the container startup.

@sphuber does verdi daemon stop do some kind of cleanup that might explain why it is needed here? It is true that we currently don't gracefully stop the daemon when we stop the container so there might perhaps be some stale pid files or something?

@mikibonacci what is the exact error that you saw coming from the verdi storage migrate command?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the version. The behavior has changed a bit in the 2.x minor releases. But currently, yes, verdi daemon stop should clean up stale PID-files. And it is possible that some other commands will check if the daemon "is running" based on that pid file. Would have to look into more detail if you know exactly which version you are targeting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We're currently on 2.5, and hopefully will switch to 2.6 soon.

I was trying to get rid of verdi daemon stop because it adds 2s to container startup. Perhaps we can just remove the stale pid files manually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danielhollas and @sphuber, the exact error printed in the log is:

2024-07-03 15:41:58 Critical: Migration aborted, the daemon for the profile is still running.

The target AiiDA version is aiida-core==2.5.1 (build=pyhca7485f_0, channel=conda-forge) indeed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to get rid of verdi daemon stop because it adds 2s to container startup. Perhaps we can just remove the stale pid files manually?

Well if you are absolutely certain that the daemon shouldn't be running, you could just manually remove the pid file .aiida/daemon/circus-{profile-name}.pid which should be very fast. The verdi daemon stop command will actually try to reach the daemon, which has a 2 second timeout by default.

Copy link
Member

@unkcpz unkcpz Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command verdi storage migrate has access to daemon client, so probably another solution is when --force flag is used, it does clean stale PID files before migrate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that it would be a good idea to have --force ignore a running daemon. The --force flag is there to not have to type out the confirmation message. This has been the case for a long time and users may have grown used to that. If we now also have it start ignoring a running daemon, that might be dangerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then manually delete it as Miki did here is fine.

@danielhollas
Copy link
Contributor

Thanks for catching these issues.

I encountered issues in restarting the container.

How exactly are you restarting the container?

@mikibonacci
Copy link
Member Author

For the restart, I used aiidalab-launch and also the manual startup (docker desktop and terminal). These issues are there only when I use arm64 machine (my Mac). That's strange.

mikibonacci and others added 3 commits July 4, 2024 00:06
…e-aiida.sh`

This is faster than `verdi daemon stop`. We suppose that the daemon is
not running when we start/restart the container.
@danielhollas
Copy link
Contributor

@mikibonacci can you test the latest version of the image (ghcr.io/aiidalab/full-stack:pr-476).

Once you confirm that it works reliably for you I'll merge and release a new version.

Thanks for catching this and the fixes!

@mikibonacci
Copy link
Member Author

Hi @danielhollas , I tested the ghcr.io/aiidalab/full-stack:pr-476 and it works well for me! Thanks a lot for the review!

@danielhollas danielhollas self-requested a review July 4, 2024 16:06
@danielhollas danielhollas merged commit c4dbc02 into main Jul 4, 2024
15 checks passed
@danielhollas danielhollas deleted the fix/arm64/daemon_clean_stop branch July 4, 2024 16:09
@danielhollas
Copy link
Contributor

Thanks for testing @mikibonacci. I'll release a new version soon with your fix.

I've opened #477 as a follow up to this to prevent these issues in the future.

@danielhollas
Copy link
Contributor

@mikibonacci new version is released. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants