Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: WDL Script Overhaul and Squash #1006

Merged
merged 2 commits into from
Sep 28, 2022
Merged

feat: WDL Script Overhaul and Squash #1006

merged 2 commits into from
Sep 28, 2022

Conversation

j23414
Copy link

@j23414 j23414 commented Sep 26, 2022

Description of proposed changes

Major overhaul of the WDL scripts motivated by #1005 with summarized changes below:

Through regular meetings with a potential user, the default behavior of a basic build was redesigned for the base case where a user only provides a sequence and metadata file pair (no build.yaml file):

Squash several Terra WDL ingest changes (historically on wdl/genbank_ingest and wdl/gisaid_ingest) into one commit. Changes listed below:

  • Splits the ncov_ingest task into gisaid_ingest and genbank_ingest
  • Can optionally take a compressed or decompressed nextclade cache file to reduce runtime
  • Parameterizes CPU and memory usage depending on new or cached run
  • Captures log files in the results.zip folder
  • Adds an optional tsv filter to replicate the regional datasets
  • Logs the date in a LAST_RUN output string
  • Compress with zstd, however can decompress either xz or zst cache

Related issue(s)

Related to #1005

Testing

Testing here might be premature, at least until I can subsequently update the documentation (#999) to point at the new Dockstore entries.

However for the reviewer with time/incentive, testing is possible via a separate repo: https://github.com/j23414/wdl_pathogen_build where all three dockstore entries are available.

Assuming the reviewer has access to our development Terra workspace:

Can kick off three tests from the "WORKFLOWS" tab, by starting with the "test_genbank_ingest", "test_gisaid_ingest", and "ncov" cards.

I'll leave this up til end of day Tuesday just in case there are comments, otherwise will merge so I can start editing documentation.

j23414 added 2 commits September 9, 2022 16:01
Through regular meetings with a potential user, the default behavior of a basic build is being redesigned for the base case:

If a user only provides a sequence and metadata file pair (no build.yaml file), then:

* An open reference file will be included in the builds.yaml to provide phylogenetic context
* All the sequences will be included by default
* The user provided sequencees will be colored by a "custom data" field as described by:
  https://docs.nextstrain.org/projects/ncov/en/latest/tutorial/custom-data.html#break-down-the-command
Squash several wdl Terra ingest changes into one commit. Changes listed below:

* Splits the ncov_ingest task into gisaid_ingest and genbank_ingest
* Can optionally take a compressed or decompressed nextclade cache file to reduce runtime
* Parameterizes cpu and memory usage depending on new or cached run
* Captures log files in the results.zip folder
* Adds an optional tsv filter to replicate the regional datasets
* Logs the date in a LAST_RUN output string
* Compress with zstd, however can decompress xz or zst cache
@j23414 j23414 self-assigned this Sep 26, 2022
@j23414 j23414 requested a review from huddlej September 26, 2022 23:14
@j23414 j23414 merged commit 55ae46f into master Sep 28, 2022
@j23414 j23414 deleted the wdl/ingest_squash branch September 28, 2022 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

1 participant