-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move code from main workflow to subworkflows #446
Conversation
|
FYI This is still on my radar @prototaxites - I just need to find more a space of more than 1h to be able to sit and focus on this as it's sort of a large refactoring 😬 (but very needed and welcome!) |
@prototaxites I just realised this will need a merge-conflict resolve since run-merging went in, sorry 🤦 |
Was on my radar - should be fixed now! |
Had a thought while running umpteen gels in the lab today - taxprofiler and mag both take the same type of input data, and have very similar pre-processing steps: fastqc -> fastp -> (taxprofiler only: complexity filtering) -> host removal -> (mag only: phix removal) -> fastqc. Would it make sense as part of a larger refactoring to consider spinning (some of) these parts into installable subworkflows, so that the exact same code could be shared between the pipelines? |
Yes very much so! |
Going to suggest we close this PR and revisit down the line. The pipeline has changed a lot with 2.4.0, and while I think it's definitely a good plan to break the pipeline in to subworkflows, I suspect it would be better to visit each independently and perhaps a bit of a roadmap. |
Yes sounds good 👍 |
Creates a number of new subworkflows for separate 'stages' of the pipeline: short read pre-processing, long read pre-processing, short-read taxonomy, assembly, assembly QC (Quast), bin QC, bin taxonomy, and annotation.
Tidies the workflow particularly around the assembly input (as discussed here: #439) to avoid nested if-elses where possible and use empty channel skipping.
I moved the various DB channel resolving components to their respective subworkflows - this simplifies the architecture a bit by not having to pass DBs as arguments to subworkflows, but perhaps reduces their re-usability elsewhere. Not sure what the preferred style is here but this can be easily changed.
Subworkflow names suggestions only 😅
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).