Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression to MSA modules #4754

Merged
merged 68 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
2680830
add pigz to clustalo
lrauschning Dec 8, 2023
23f496e
add compression to muscle5
lrauschning Dec 8, 2023
2d704a7
enabled compression flag for famsa
lrauschning Dec 8, 2023
f8ddf9f
added compression to mafft
lrauschning Dec 8, 2023
c0d1d50
compression for mtmalign
lrauschning Dec 21, 2023
3eeca26
set to mulled containers
lrauschning Jan 16, 2024
d9666d7
more informative test name
lrauschning Jan 16, 2024
21f5f81
change mtmalign test to search after unzipping
lrauschning Jan 16, 2024
c1e26f3
update mtmalign tests to work with gzip, fix typo
lrauschning Jan 16, 2024
8eb641e
regenerate test snaps
Jan 16, 2024
82ea41a
muscle5: zip multiple output files, if present
lrauschning Jan 16, 2024
15f5102
Change MUSCLE5 tests to the same testcase TCOFFEE is using, also fix it
lrauschning Jan 16, 2024
1915cc3
add tags requested by nf-core-lint
lrauschning Jan 17, 2024
0839b98
add full url to singularity/biocontainers
lrauschning Jan 17, 2024
b21623e
fix famsa
lrauschning Jan 17, 2024
c790e44
regenerated snapshots with nf-test 0.8.3. Reenabled snapshots for mus…
Jan 17, 2024
5a6e78e
forgot to regenerate mafft, also mtmalign seems to still be nondeterm…
Jan 17, 2024
d26dae7
update metas
lrauschning Jan 18, 2024
083fc72
compression support for tcoffee modules
lrauschning Jan 19, 2024
2497b9f
added pigz to tools in meta
lrauschning Jan 19, 2024
f6cfc4a
fix typo
lrauschning Jan 19, 2024
9c6481d
regenerate snaps, adjust test to gzip
Jan 19, 2024
bd443c1
added mulled containers for tcoffee
lrauschning Jan 22, 2024
ea6d06e
implement compression switching with channel
lrauschning Jan 22, 2024
746d601
add tags wanted by lint
lrauschning Jan 22, 2024
93f2943
regenerate snapshots
lrauschning Jan 22, 2024
a3e6205
whoops, regenerated using container this time
Jan 22, 2024
7a5bb1e
update meta.yml
lrauschning Jan 22, 2024
32e3508
update glob in meta.yml
lrauschning Jan 22, 2024
b29d802
support compressed input in irmsd
lrauschning Jan 22, 2024
ad33963
assign more precise type in meta.yml
lrauschning Jan 22, 2024
21ba7d4
add tag flagged by lint to tcoffee/irmsd
lrauschning Jan 22, 2024
e9617ba
set tcoffee/irmsd to use mulled container
lrauschning Jan 22, 2024
4fd533d
tcoffee/irmsd: do not compress template file, and correctly uncompres…
lrauschning Jan 22, 2024
1353118
tcoffee/align: reimplement toggling compression
lrauschning Jan 23, 2024
9e44859
tcoffee/align: use new pipe name everywhere
lrauschning Jan 23, 2024
ad9516e
tcoffee/align: reenable default html output, add comment
lrauschning Jan 23, 2024
a181b03
fix escaped line at end of comment...
lrauschning Jan 23, 2024
c047a6d
tcoffee/align: make tcoffee write to stdout, avoid using fifo
lrauschning Jan 23, 2024
6251e0a
clustalo/align: add optional compression
lrauschning Jan 24, 2024
ac15c92
muscle5/super5: add optional compression, also expand tests
lrauschning Jan 24, 2024
2ee7bed
update snapshot
Jan 24, 2024
86568d9
muscle5/super5: re-add empty config file
lrauschning Jan 24, 2024
b0dbaa4
mafft: implement optional output compression, handle compressed input
lrauschning Jan 24, 2024
222b029
muscle5/super5: better parallelization for compressed -perm all
lrauschning Jan 24, 2024
81ced39
mtmalign/align: implement optional compression
lrauschning Jan 25, 2024
3e82387
mtmalign/align: add pigz to versions.yml
lrauschning Jan 25, 2024
2df6c38
mtmalign/align: fix
lrauschning Jan 25, 2024
6c141ab
regenerate snapshot
Jan 25, 2024
6338cf2
famsa/align: implement optional compression
lrauschning Jan 25, 2024
984ee51
whoops, fix tests
lrauschning Jan 25, 2024
a5330ef
clustalo/align: fix
lrauschning Jan 25, 2024
1f7791b
update snapshots
Jan 25, 2024
838ec1b
generate different snapshots for compressed & uncompressed tests, pre…
lrauschning Jan 25, 2024
d2e05da
updated snapshots
Jan 25, 2024
7a0e78d
mtmalign/align: update input pattern
lrauschning Jan 26, 2024
103736a
tcoffee/alncompare,irmsd: implement jose's suggestion
lrauschning Jan 26, 2024
d870e1d
tcoffee/irmsd: additional test for compressed input
lrauschning Jan 26, 2024
ddc6725
tcoffee/irmsd: add tag required by lint
lrauschning Jan 26, 2024
e8ae161
Revert "mtmalign/align: update input pattern"
lrauschning Jan 26, 2024
8184055
incorporate adams suggestion, fix stub filename extensions
lrauschning Feb 9, 2024
1d35d3e
apparently this requires regenerating the snapshots?
Feb 9, 2024
706d05f
try removing test match names, as per sateesh's suggestion
lrauschning Feb 13, 2024
b338910
Revert "try removing test match names, as per sateesh's suggestion"
lrauschning Feb 13, 2024
4528520
Merge branch 'master' into msa-compression
lrauschning Feb 15, 2024
1f89b92
tcoffee/align change snapshot names
lrauschning Feb 15, 2024
a624c58
make snapshot names unique for nf-test 0.8.4
lrauschning Feb 15, 2024
6f6da51
Merge branch 'master' into msa-compression
lrauschning Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/nf-core/clustalo/align/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ channels:
- defaults
dependencies:
- bioconda::clustalo=1.2.4
- conda-forge::pigz=2.8
31 changes: 20 additions & 11 deletions modules/nf-core/clustalo/align/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,54 @@ process CLUSTALO_ALIGN {

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/clustalo:1.2.4--h87f3376_5':
'biocontainers/clustalo:1.2.4--h87f3376_5' }"
'https://depot.galaxyproject.org/singularity/mulled-v2-4cefc38542f86c17596c29b35a059de10387c6a7:adbe4fbad680f9beb083956d79128039a727e7b3-0':
'biocontainers/mulled-v2-4cefc38542f86c17596c29b35a059de10387c6a7:adbe4fbad680f9beb083956d79128039a727e7b3-0' }"

input:
tuple val(meta), path(fasta)
tuple val(meta) , path(fasta)
tuple val(meta2), path(tree)
val(compress)

output:
tuple val(meta), path("*.aln"), emit: alignment
path "versions.yml" , emit: versions
tuple val(meta), path("*.aln{.gz,}"), emit: alignment
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def write_output = compress ? "--force -o >(pigz -cp ${task.cpus} > ${prefix}.aln.gz)" : "> ${prefix}.aln"
// using >() is necessary to preserve the return value,
// so nextflow knows to display an error when it failed
// the --force -o is necessary, as clustalo expands the commandline input,
// causing it to treat the pipe as a parameter and fail
// this way, the command expands to /dev/fd/<id>, and --force allows writing output to an already existing file
Comment on lines +25 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this but I guess most pipeline developers will leave it on true and forget about it, so why not?

"""
clustalo \\
-i ${fasta} \\
--threads=${task.cpus} \\
$args \\
-o ${prefix}.aln
clustalo \
-i ${fasta} \
--threads=${task.cpus} \
$args \
$write_output

cat <<-END_VERSIONS > versions.yml
"${task.process}":
clustalo: \$( clustalo --version )
pigz: \$(echo \$(pigz --version 2>&1) | sed 's/^.*pigz\\w*//' ))
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.aln
touch ${prefix}.aln.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change based on the compress value. Something like this:

    stub:
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    def output = compress ? "${prefix}.aln.gz" : "${prefix}.aln"
    """
    touch ${output}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch, didn't change that since introducing the compress input channel. Might also affect some of the other modules, I'll have a look.


cat <<-END_VERSIONS > versions.yml
"${task.process}":
clustalo: \$( clustalo --version )
pigz: \$(echo \$(pigz --version 2>&1) | sed 's/^.*pigz\\w*//' ))
END_VERSIONS
"""
}
12 changes: 10 additions & 2 deletions modules/nf-core/clustalo/align/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ tools:
tool_dev_url: "http://www.clustal.org/omega/"
doi: "10.1038/msb.2011.75"
licence: ["GPL v2"]
- "pigz":
description: "Parallel implementation of the gzip algorithm."
homepage: "https://zlib.net/pigz/"
documentation: "https://zlib.net/pigz/pigz.pdf"
input:
- meta:
type: map
Expand All @@ -31,6 +35,9 @@ input:
type: file
description: Input guide tree in Newick format
pattern: "*.{dnd}"
- compress:
type: boolean
description: Flag representing whether the output MSA should be compressed. Set to true to enable/false to disable compression. Compression is done using pigz, and is multithreaded.
output:
- meta:
type: map
Expand All @@ -39,8 +46,8 @@ output:
e.g. `[ id:'test']`
- alignment:
type: file
description: Alignment file.
pattern: "*.{aln}"
description: Alignment file, in gzipped fasta format
pattern: "*.aln{.gz,}"
- versions:
type: file
description: File containing software versions
Expand All @@ -51,3 +58,4 @@ authors:
maintainers:
- "@luisas"
- "@joseespinosa"
- "@lrauschning"
35 changes: 31 additions & 4 deletions modules/nf-core/clustalo/align/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,38 @@ nextflow_process {
script "../main.nf"
process "CLUSTALO_ALIGN"
config "./nextflow.config"

tag "modules"
tag "modules_nfcore"
tag "clustalo"
tag "clustalo/align"
tag "clustalo/guidetree"

test("sarscov2 - contigs-fasta - uncompressed") {

when {
process {
"""
input[0] = [ [ id:'test' ], // meta map
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = false
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment - uncompressed")},
{ assert snapshot(process.out.versions).match("versions") }
)
}

}

test("sarscov2 - contigs-fasta") {
test("sarscov2 - contigs-fasta - compressed") {

when {
process {
Expand All @@ -19,14 +44,15 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = true
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment")},
{ assert snapshot(process.out.alignment).match("alignment - compressed")},
{ assert snapshot(process.out.versions).match("versions") }
)
}
Expand Down Expand Up @@ -56,6 +82,7 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = CLUSTALO_GUIDETREE.out.tree.collect{ meta, tree -> tree }.map{ tree -> [[ id: 'test_summary'], tree]}
input[2] = true
"""
}
}
Expand All @@ -68,4 +95,4 @@ nextflow_process {
)
}
}
}
}
31 changes: 22 additions & 9 deletions modules/nf-core/clustalo/align/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 8 additions & 5 deletions modules/nf-core/famsa/align/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,29 @@ process FAMSA_ALIGN {
'biocontainers/famsa:2.2.2--h9f5acd7_0' }"

input:
tuple val(meta), path(fasta)
tuple val(meta) , path(fasta)
tuple val(meta2), path(tree)
val(compress)

output:
tuple val(meta), path("*.aln"), emit: alignment
path "versions.yml" , emit: versions
tuple val(meta), path("*.aln{.gz,}"), emit: alignment
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def compress_args = compress ? '-gz' : ''
def prefix = task.ext.prefix ?: "${meta.id}"
def options_tree = tree ? "-gt import $tree" : ""
"""
famsa $options_tree \\
$compress_args \\
$args \\
-t ${task.cpus} \\
${fasta} \\
${prefix}.aln
${prefix}.aln${compress ? '.gz':''}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm coming around to this idea a bit more, it seems to be cleaner and harder to mess up.

Copy link
Contributor Author

@lrauschning lrauschning Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it being the most clean and straightforward to understand/document (edit: compared to the other options we came up with) is I think the main advantage.
Especially for tools like FAMSA which natively support compression its also cleaner than the output format changing based on a parameter passed via ext.args.


cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand All @@ -40,7 +43,7 @@ process FAMSA_ALIGN {
stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.aln
touch ${prefix}.aln.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
7 changes: 5 additions & 2 deletions modules/nf-core/famsa/align/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ input:
type: file
description: Input guide tree in Newick format
pattern: "*.{dnd}"
- compress:
type: boolean
description: Flag representing whether the output MSA should be compressed. Set to true to enable/false to disable compression. Compression is handled by passing '-gz' to FAMSA along with any other options specified in task.ext.args.
output:
- meta:
type: map
Expand All @@ -41,8 +44,8 @@ output:
e.g. `[ id:'test']`
- alignment:
type: file
description: Alignment file.
pattern: "*.{aln}"
description: Alignment file, in FASTA format. May be gzipped or uncompressed, depending on if compress is set to true or false
pattern: "*.aln{.gz,}"
- versions:
type: file
description: File containing software versions
Expand Down
33 changes: 30 additions & 3 deletions modules/nf-core/famsa/align/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ nextflow_process {
tag "modules_nfcore"
tag "famsa"
tag "famsa/align"
tag "famsa/guidetree"

test("sarscov2 - fasta") {
test("sarscov2 - fasta - uncompressed") {

when {
process {
Expand All @@ -18,14 +19,39 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = false
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment")},
{ assert snapshot(process.out.alignment).match("alignment_uncompressed")},
{ assert snapshot(process.out.versions).match("versions") }
)
}

}

test("sarscov2 - fasta - compressed") {

when {
process {
"""
input[0] = [ [ id:'test' ], // meta map
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = [[:],[]]
input[2] = true
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.alignment).match("alignment_compressed")},
{ assert snapshot(process.out.versions).match("versions") }
)
}
Expand Down Expand Up @@ -54,6 +80,7 @@ nextflow_process {
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true)
]
input[1] = FAMSA_GUIDETREE.out.tree.collect{ meta, tree -> tree }.map{ tree -> [[ id: 'test_summary'], tree]}
input[2] = true
"""
}
}
Expand All @@ -66,4 +93,4 @@ nextflow_process {
)
}
}
}
}
Loading
Loading