Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ena_deposition): Make deposition a package instead of multiple snakemake rules #2976

Merged
merged 28 commits into from
Oct 18, 2024

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Oct 8, 2024

resolves #2878

preview URL: https://refactor-ena-deposition.loculus.org/

Problem

If one snakemake rule fails the other rules will keep running indefinitely. We should stop other rules if one fails.

Summary

  • Create a new package called ena_deposition. One config is generated at startup, this uses defaults from the config/default.yaml and then overwrites additional config arguments from the passed config/config.yaml (as the snakefile did before). This reduces repetition as instead of one config for each rule, also there is now only one config and only one cli entry point.
  • The create_project,create_sample, create_assembly, trigger_submission and upload_external_metadata functions are run in parallel threads (instead of before being run in parallel snakemake rules) and use stop_event to stop all threads if one fails.
    Tested that this works as intended on a preview by causing an Exception to be raised in the trigger_submission function thread - this lead to all other threads being stopped and the pod stopping and being restarted by kubernetes:
13:33:20    DEBUG (  create_assembly.py: 455) - Checking state in ENA
Task generated an exception: Testing threads stop!!!
upload_external_metadata stopped due to exception in another task
create_sample stopped due to exception in another task
create_assembly stopped due to exception in another task
create_project stopped due to exception in another task

Additional changes:

  • snakemake was removed as a dependency, this lead to xmltodict working differently than before. I needed to create a new class called XmlNone for the case when a field is intentionally left None in the dictionary, e.g. SEQUENCING_PROJECT in the xml below:
<SUBMISSION_PROJECT>
    <SEQUENCING_PROJECT/>
    <ORGANISM>
        <TAXON_ID>Test taxon</TAXON_ID>
        <SCIENTIFIC_NAME>Test scientific name</SCIENTIFIC_NAME>
    </ORGANISM>
</SUBMISSION_PROJECT>
  • As I now specify the log level in main I do not create a custom logger object for each script but use logging in all files - leading to logger being changed to logging in multiple files.
  • I created a secure_ena_connection function to check if connections to ENA are done correctly instead of doing this at the start of the snakefile.

Screenshot

PR Checklist

  • All necessary documentation has been adapted.
  • Add trigger_submission_to_ena_from_file option
  • Test if one thread raises an error the other will stop
  • Test submission still works: tested full submission cycle including upload to Loculus

@anna-parker anna-parker added the preview Triggers a deployment to argocd label Oct 9, 2024
@anna-parker anna-parker changed the title Refactor ena deposition feat(ena_deposition): Make deposition a package instead of multiple snakemake rules Oct 9, 2024
@anna-parker anna-parker marked this pull request as ready for review October 9, 2024 14:08
@anna-parker anna-parker force-pushed the refactor_ena_deposition branch 3 times, most recently from f1e62f1 to d733eeb Compare October 18, 2024 13:33
Copy link
Contributor

@corneliusroemer corneliusroemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, nice DRYification!

@anna-parker anna-parker merged commit b022f4d into main Oct 18, 2024
16 checks passed
@anna-parker anna-parker deleted the refactor_ena_deposition branch October 18, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
2 participants