Skip to content

Commit

Permalink
Add compression to MSA modules (nf-core#4754)
Browse files Browse the repository at this point in the history
* add pigz to clustalo

* add compression to muscle5

* enabled compression flag for famsa

* added compression to mafft

* compression for mtmalign

* set to mulled containers

* more informative test name

* change mtmalign test to search after unzipping

* update mtmalign tests to work with gzip, fix typo

* regenerate test snaps

* muscle5: zip multiple output files, if present

* Change MUSCLE5 tests to the same testcase TCOFFEE is using, also fix it

* add tags requested by nf-core-lint

* add full url to singularity/biocontainers

* fix famsa

* regenerated snapshots with nf-test 0.8.3. Reenabled snapshots for muscle5 and mtmalign

* forgot to regenerate mafft, also mtmalign seems to still be nondeterministic

* update metas

* compression support for tcoffee modules

* added pigz to tools in meta

* fix typo

* regenerate snaps, adjust test to gzip

* added mulled containers for tcoffee

* implement compression switching with channel

* add tags wanted by lint

* regenerate snapshots

* whoops, regenerated using container this time

* update meta.yml

* update glob in meta.yml

* support compressed input in irmsd

* assign more precise type in meta.yml

* add tag flagged by lint to tcoffee/irmsd

* set tcoffee/irmsd to use mulled container

* tcoffee/irmsd: do not compress template file, and correctly uncompress for irmsd

* tcoffee/align: reimplement toggling compression

* tcoffee/align: use new pipe name everywhere

* tcoffee/align: reenable default html output, add comment

* fix escaped line at end of comment...

* tcoffee/align: make tcoffee write to stdout, avoid using fifo

* clustalo/align: add optional compression

* muscle5/super5: add optional compression, also expand tests

* update snapshot

* muscle5/super5: re-add empty config file

* mafft: implement optional output compression, handle compressed input

* muscle5/super5: better parallelization for compressed -perm all

* mtmalign/align: implement optional compression

* mtmalign/align: add pigz to versions.yml

* mtmalign/align: fix

* regenerate snapshot

* famsa/align: implement optional compression

* whoops, fix tests

* clustalo/align: fix

* update snapshots

* generate different snapshots for compressed & uncompressed tests, prettify code

* updated snapshots

* mtmalign/align: update input pattern

* tcoffee/alncompare,irmsd: implement jose's suggestion

* tcoffee/irmsd: additional test for compressed input

* tcoffee/irmsd: add tag required by lint

* Revert "mtmalign/align: update input pattern"

This reverts commit 7a0e78d.

* incorporate adams suggestion, fix stub filename extensions

* apparently this requires regenerating the snapshots?

* try removing test match names, as per sateesh's suggestion

* Revert "try removing test match names, as per sateesh's suggestion"

This reverts commit 706d05f.

* tcoffee/align change snapshot names

* make snapshot names unique for nf-test 0.8.4

---------

Co-authored-by: Leon Rauschning <[email protected]>
  • Loading branch information
2 people authored and jennylsmith committed Mar 20, 2024
1 parent cceb02b commit f2d7bce
Show file tree
Hide file tree
Showing 40 changed files with 676 additions and 417 deletions.
1 change: 1 addition & 0 deletions modules/nf-core/clustalo/align/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ channels:
- defaults
dependencies:
- bioconda::clustalo=1.2.4
- conda-forge::pigz=2.8
31 changes: 20 additions & 11 deletions modules/nf-core/clustalo/align/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,54 @@ process CLUSTALO_ALIGN {

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/clustalo:1.2.4--h87f3376_5':
'biocontainers/clustalo:1.2.4--h87f3376_5' }"
'https://depot.galaxyproject.org/singularity/mulled-v2-4cefc38542f86c17596c29b35a059de10387c6a7:adbe4fbad680f9beb083956d79128039a727e7b3-0':
'biocontainers/mulled-v2-4cefc38542f86c17596c29b35a059de10387c6a7:adbe4fbad680f9beb083956d79128039a727e7b3-0' }"

input:
tuple val(meta), path(fasta)
tuple val(meta) , path(fasta)
tuple val(meta2), path(tree)
val(compress)

output:
tuple val(meta), path("*.aln"), emit: alignment
path "versions.yml" , emit: versions
tuple val(meta), path("*.aln{.gz,}"), emit: alignment
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def write_output = compress ? "--force -o >(pigz -cp ${task.cpus} > ${prefix}.aln.gz)" : "> ${prefix}.aln"
// using >() is necessary to preserve the return value,
// so nextflow knows to display an error when it failed
// the --force -o is necessary, as clustalo expands the commandline input,
// causing it to treat the pipe as a parameter and fail
// this way, the command expands to /dev/fd/<id>, and --force allows writing output to an already existing file
"""
clustalo \\
-i ${fasta} \\
--threads=${task.cpus} \\
$args \\
-o ${prefix}.aln
clustalo \
-i ${fasta} \
--threads=${task.cpus} \
$args \
$write_output
cat <<-END_VERSIONS > versions.yml
"${task.process}":
clustalo: \$( clustalo --version )
pigz: \$(echo \$(pigz --version 2>&1) | sed 's/^.*pigz\\w*//' ))
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.aln
touch ${prefix}.aln${compress ? '.gz' : ''}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
clustalo: \$( clustalo --version )
pigz: \$(echo \$(pigz --version 2>&1) | sed 's/^.*pigz\\w*//' ))
END_VERSIONS
"""
}
12 changes: 10 additions & 2 deletions modules/nf-core/clustalo/align/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ tools:
tool_dev_url: "http://www.clustal.org/omega/"
doi: "10.1038/msb.2011.75"
licence: ["GPL v2"]
- "pigz":
description: "Parallel implementation of the gzip algorithm."
homepage: "https://zlib.net/pigz/"
documentation: "https://zlib.net/pigz/pigz.pdf"
input:
- meta:
type: map
Expand All @@ -31,6 +35,9 @@ input:
type: file
description: Input guide tree in Newick format
pattern: "*.{dnd}"
- compress:
type: boolean
description: Flag representing whether the output MSA should be compressed. Set to true to enable/false to disable compression. Compression is done using pigz, and is multithreaded.
output:
- meta:
type: map
Expand All @@ -39,8 +46,8 @@ output:
e.g. `[ id:'test']`
- alignment:
type: file
description: Alignment file.
pattern: "*.{aln}"
description: Alignment file, in gzipped fasta format
pattern: "*.aln{.gz,}"
- versions:
type: file
description: File containing software versions
Expand All @@ -51,3 +58,4 @@ authors:
maintainers:
- "@luisas"
- "@joseespinosa"
- "@lrauschning"
37 changes: 32 additions & 5 deletions modules/nf-core/clustalo/align/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,38 @@ nextflow_process {
script "../main.nf"
process "CLUSTALO_ALIGN"
config "./nextflow.config"

tag "modules"
tag "modules_nfcore"
tag "clustalo"
tag "clustalo/align"
tag "clustalo/guidetree"

test("sarscov2 - contigs-fasta - uncompressed") {

when {
process {
"""
input[0] = [ [ id:'test' ], // meta map
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = false
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment - uncompressed")},
{ assert snapshot(process.out.versions).match("versions0") }
)
}

}

test("sarscov2 - contigs-fasta") {
test("sarscov2 - contigs-fasta - compressed") {

when {
process {
Expand All @@ -19,15 +44,16 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = true
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment")},
{ assert snapshot(process.out.versions).match("versions") }
{ assert snapshot(process.out.alignment).match("alignment - compressed")},
{ assert snapshot(process.out.versions).match("versions1") }
)
}

Expand Down Expand Up @@ -56,6 +82,7 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = CLUSTALO_GUIDETREE.out.tree.collect{ meta, tree -> tree }.map{ tree -> [[ id: 'test_summary'], tree]}
input[2] = true
"""
}
}
Expand All @@ -68,4 +95,4 @@ nextflow_process {
)
}
}
}
}
31 changes: 22 additions & 9 deletions modules/nf-core/clustalo/align/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 8 additions & 5 deletions modules/nf-core/famsa/align/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,29 @@ process FAMSA_ALIGN {
'biocontainers/famsa:2.2.2--h9f5acd7_0' }"

input:
tuple val(meta), path(fasta)
tuple val(meta) , path(fasta)
tuple val(meta2), path(tree)
val(compress)

output:
tuple val(meta), path("*.aln"), emit: alignment
path "versions.yml" , emit: versions
tuple val(meta), path("*.aln{.gz,}"), emit: alignment
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def compress_args = compress ? '-gz' : ''
def prefix = task.ext.prefix ?: "${meta.id}"
def options_tree = tree ? "-gt import $tree" : ""
"""
famsa $options_tree \\
$compress_args \\
$args \\
-t ${task.cpus} \\
${fasta} \\
${prefix}.aln
${prefix}.aln${compress ? '.gz':''}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand All @@ -40,7 +43,7 @@ process FAMSA_ALIGN {
stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.aln
touch ${prefix}.aln${compress ? '.gz' : ''}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
7 changes: 5 additions & 2 deletions modules/nf-core/famsa/align/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ input:
type: file
description: Input guide tree in Newick format
pattern: "*.{dnd}"
- compress:
type: boolean
description: Flag representing whether the output MSA should be compressed. Set to true to enable/false to disable compression. Compression is handled by passing '-gz' to FAMSA along with any other options specified in task.ext.args.
output:
- meta:
type: map
Expand All @@ -41,8 +44,8 @@ output:
e.g. `[ id:'test']`
- alignment:
type: file
description: Alignment file.
pattern: "*.{aln}"
description: Alignment file, in FASTA format. May be gzipped or uncompressed, depending on if compress is set to true or false
pattern: "*.aln{.gz,}"
- versions:
type: file
description: File containing software versions
Expand Down
35 changes: 31 additions & 4 deletions modules/nf-core/famsa/align/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ nextflow_process {
tag "modules_nfcore"
tag "famsa"
tag "famsa/align"
tag "famsa/guidetree"

test("sarscov2 - fasta") {
test("sarscov2 - fasta - uncompressed") {

when {
process {
Expand All @@ -18,15 +19,40 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = false
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment")},
{ assert snapshot(process.out.versions).match("versions") }
{ assert snapshot(process.out.alignment).match("alignment_uncompressed")},
{ assert snapshot(process.out.versions).match("versions0") }
)
}

}

test("sarscov2 - fasta - compressed") {

when {
process {
"""
input[0] = [ [ id:'test' ], // meta map
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = true
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment_compressed")},
{ assert snapshot(process.out.versions).match("versions1") }
)
}

Expand Down Expand Up @@ -54,6 +80,7 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = FAMSA_GUIDETREE.out.tree.collect{ meta, tree -> tree }.map{ tree -> [[ id: 'test_summary'], tree]}
input[2] = true
"""
}
}
Expand All @@ -66,4 +93,4 @@ nextflow_process {
)
}
}
}
}
Loading

0 comments on commit f2d7bce

Please sign in to comment.