Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow local databases to be used for kraken2, centrifuge, and busco #504

Merged
merged 27 commits into from
Oct 9, 2023

Conversation

gregorysprenger
Copy link
Contributor

@gregorysprenger gregorysprenger commented Sep 5, 2023

This PR is a follow up to issue #498.

This adds the ability to check if a database is compressed or decompressed. Updated BUSCO to only have to specify --busco_db for databases that are both compressed or decompressed. It determines if an input is a lineage dataset by looking for odb10 in the input filename (ie. bacteria_odb10). Since I dropped --busco_reference, I decided to rename --save_busco_reference to --save_busco_db.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6190222

+| ✅ 158 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in WorkflowMag.groovy: Optionally add in-text citation tools to this list.

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-10-09 07:40:20

@jfy133
Copy link
Member

jfy133 commented Sep 12, 2023

@gregorysprenger I have a little bit of spare time so just had a look: I wonder if the error is coming from the fact that busco_db is coming in as a value channel rather than just a path/file object itself? So isDirectory() doesn't work on channel?

@gregorysprenger
Copy link
Contributor Author

@gregorysprenger I have a little bit of spare time so just had a look: I wonder if the error is coming from the fact that busco_db is coming in as a value channel rather than just a path/file object itself? So isDirectory() doesn't work on channel?

Yeah, I got it figured out now! Once I finish running the test profiles locally (and they pass), I'll get everything pushed over.

}
if (params.busco_download_path) {
Nextflow.error('Both --skip_binqc and --busco_download_path are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool \'busco\' with --busco_download_path.')
if (params.busco_db) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outwith the scope of this PR for sure, but I think cases like this where you skip a step but specify a database should probably just print a warning rather than quitting. Makes debugging a little easier if you just need to quickly turn something off.

},
"busco_auto_lineage_prok": {
"type": "boolean",
"description": "Run BUSCO with automated lineage selection, but ignoring eukaryotes (saves runtime)."
},
"save_busco_reference": {
"save_busco_db": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should keep its name (or if it should be renamed to "save_busco_references"? As far as I see it, the use case for this parameter is to save the lineages files downloaded when using online auto lineage detection. On their own, these don't actually comprise a busco database as there are other index files you need to download in order to pass a directory to busco_db.

@prototaxites
Copy link
Contributor

Couple of small comments, but otherwise LGTM I think 👍

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a changelog update! Also given we are changing a parameter it needs tobe loudly flagged (I think thsi will require a 2.5 infact...))

I'm starting my test runs now, hope to report back soon!

modules/local/busco.nf Show resolved Hide resolved
modules/local/busco.nf Show resolved Hide resolved
modules/local/busco_db_preparation.nf Outdated Show resolved Hide resolved
modules/local/kraken2_db_preparation.nf Outdated Show resolved Hide resolved
modules/local/kraken2_db_preparation.nf Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
subworkflows/local/busco_qc.nf Outdated Show resolved Hide resolved
subworkflows/local/gtdbtk.nf Outdated Show resolved Hide resolved
@jfy133
Copy link
Member

jfy133 commented Sep 28, 2023

Manual Testing results:

Busco

  • ✅ Nothing (auto download?): nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name busco_nothing --skip_spades --skip_spadeshybrid
  • ✅ URL nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name busco_url --skip_spades --skip_spadeshybrid --busco_db "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz"
  • ✅ tar archive nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name busco_tar --skip_spades --skip_spadeshybrid --busco_db "data/bacteria_odb10.2020-03-06.tar.gz"
  • ✅ directory (unpacked tar) `nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name busco_dir --skip_spades --skip_spadeshybrid --busco_db 'data/bacteria_odb10/'

Centrifuge

  • ✅ URL nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name cf_url --skip_spades --skip_spadeshybrid --centrifuge_db 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz'
  • ✅ tar archive nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name cf_tar --skip_spades --skip_spadeshybrid --centrifuge_db 'data/minigut_cf.tar.gz'
  • ✅ directory (unpacked tar) nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name cf_dir --skip_spades --skip_spadeshybrid --centrifuge_db 'data/minigut_cf/'

Kraken2

  • ✅ URL nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name k2_url --skip_spades --skip_spadeshybrid --kraken2_db 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz'
  • ✅ tar archive nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name k2_tar --skip_spades --skip_spadeshybrid --kraken2_db 'data/minigut_kraken.tgz'
  • ✅ directory (unpacked tar) nextflow run ../main.nf -profile apptainer,test_busco_auto --outdir ./results -with-tower -name k2_dir --skip_spades --skip_spadeshybrid --kraken2_db 'data/minigut_kraken/'

gregorysprenger and others added 4 commits September 28, 2023 13:13
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
@gregorysprenger
Copy link
Contributor Author

@jfy133 Thanks for the suggestions and testing edge cases I overlooked! I'll try to get all of these fixed over the weekend.

@jfy133
Copy link
Member

jfy133 commented Sep 29, 2023

I've still got 2 to do, I hope by tomorrow!

@jfy133
Copy link
Member

jfy133 commented Sep 30, 2023

OK, it's just the BUSCO uncompressed tar/dir input which is problmatic, and after that we are good to go 👍

CHANGELOG.md Outdated Show resolved Hide resolved
@jfy133
Copy link
Member

jfy133 commented Oct 6, 2023

Thanks @gregorysprenger ! I will test again at the weekend 👍

@gregorysprenger
Copy link
Contributor Author

Thanks @gregorysprenger ! I will test again at the weekend 👍

Give me a couple more hours and I'll get the centrifuge and kraken database parsing into groovy to avoid the possible symlink issues. :)

@gregorysprenger
Copy link
Contributor Author

Alright I think I'm done this time lol. @jfy133 I ran all of your manual tests and they appeared to work fine, but double checking is always best!

Quick updates:

  • used groovy to handle if centrifuge/kraken is a compressed file or directory
  • updated centrifuge db_name parsing. when using centrifuge with the -x param, the db_name needs to equal the filename of *.cf files with *.1.cf removed
  • changed most of the toString() functions to a nextflow filename attribute function (i e. getBaseName() )

[ its weird that PythonBlack failed O.o ]

workflows/mag.nf Outdated Show resolved Hide resolved
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could always bit of refinement here and there but this works 👍

One last thing: please pull in latest changes in dev, and move your CHANGELOG entry accordingly, then we can merge!

modules/local/busco_db_preparation.nf Outdated Show resolved Hide resolved
@jfy133
Copy link
Member

jfy133 commented Oct 8, 2023

And then I'll od the first patch release :) Thanks @gregorysprenger !

@gregorysprenger
Copy link
Contributor Author

Done! Thanks @prototaxites and @jfy133 for reviewing!

@jfy133
Copy link
Member

jfy133 commented Oct 9, 2023

Tweaked further the versions to make it clearer the change, I think I will make this 2.5 as this is a larger change.

once tests pass, I will merge and make the release PR :D

@jfy133
Copy link
Member

jfy133 commented Oct 9, 2023

@nf-core-bot fix linting

@jfy133 jfy133 merged commit 86ce3d7 into nf-core:dev Oct 9, 2023
15 checks passed
@jfy133 jfy133 mentioned this pull request Oct 9, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants