Skip to content

Commit

Permalink
Split deepvariant 3 (#6172)
Browse files Browse the repository at this point in the history
* Move DeepVariant into a subcommand module rundeepvariant, preparing for split modules

The test snapshot is updated because the process name in the version file changed.

* Add a split DeepVariant workflow with individual processes for each step

* Remove hash unique ID and fix input structure issue

* Fixes for call_variants outputing sharded file

* Fix test

* Remove --channels insert_size, which is only applicable for short read
data

The channels should be specified in the pipeline config

* Replace the model type value input with ext.args config

* Fix tests: should run twice for two samples in input channel

* Fix linting issues and input channel description

* Fix formatting of md files

Co-authored-by: Felix Lenner <[email protected]>

* Corrections / imrpovements from @fellen31 review

* Check tfrecord file names

* Updating conda skipping options, because the paths have changed

* Add deprecation warning for top-level process and test for the deprecated process

* also skip conda for the new deprecated module

---------

Co-authored-by: Felix Lenner <[email protected]>
Co-authored-by: Maxime U Garcia <[email protected]>
  • Loading branch information
3 people authored Sep 12, 2024
1 parent 29110dd commit a004c86
Show file tree
Hide file tree
Showing 38 changed files with 2,241 additions and 200 deletions.
10 changes: 10 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,14 @@ jobs:
path: modules/nf-core/deepcell/mesmer
- profile: conda
path: modules/nf-core/deepvariant
- profile: conda
path: modules/nf-core/deepvariant/callvariants
- profile: conda
path: modules/nf-core/deepvariant/makeexamples
- profile: conda
path: modules/nf-core/deepvariant/postprocessvariants
- profile: conda
path: modules/nf-core/deepvariant/rundeepvariant
- profile: conda
path: modules/nf-core/ensemblvep/vep
- profile: conda
Expand Down Expand Up @@ -630,6 +638,8 @@ jobs:
path: subworkflows/nf-core/vcf_annotate_ensemblvep
- profile: conda
path: subworkflows/nf-core/bcl_demultiplex
- profile: conda
path: subworkflows/nf-core/deepvariant
- profile: conda
path: subworkflows/nf-core/fastq_align_bamcmp_bwa
- profile: conda
Expand Down
50 changes: 50 additions & 0 deletions modules/nf-core/deepvariant/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# DeepVariant module / subworkflow options

The DeepVariant tool can be run using the `deepvariant/rundeepvariant` subcommand, or the subworkflow `deepvariant`, which calls the subcommands `makeexamples`, `callvariants` and `postprocessvariants`. The subcommand `rundeepvariant` is simpler, but the subworkflow may be useful if you want to run `callvariants` on GPU.

# Conda is not supported at the moment

The [bioconda](https://bioconda.github.io/recipes/deepvariant/README.html) recipe is not fully working as expected.
Expand All @@ -9,3 +13,49 @@ Hence, we are using the docker container provided by the authors of the tool:
- [google/deepvariant](https://hub.docker.com/r/google/deepvariant)

This image is mirrored on the [nf-core quay.io](https://quay.io/repository/nf-core/deepvariant) for convenience.

# DeepVariant subworkflow

You can use the subworkflow `nf-core/deepvariant`, which integrates the three
processes to perform variant calling with common file formats.

These module subcommands incorporate the individual steps of the DeepVariant pipeline:

* makeexamples: Converts the input alignment file to a tfrecord format suitable for the deep learning model
* callvariants: Call variants based on input tfrecords. The output is also in
tfrecord format, and needs postprocessing to convert it to vcf.
* postprocessvariants: Convert variant calls from callvariants to VCF, and
also create GVCF files based on genomic information from makeexamples.

# Recommended parameters

## makeexamples

This process imports the data used for calling, and thus decides what information is available to the
deep neural network. It's important to import the correct channels for the model you want to use.

The script `run_deepvariant` (not used in the subworkflow) does this automatically. You can refer to
the implementation in the DeepVariant repo:

https://github.com/google/deepvariant/blob/bf9ed7e6de97cf6c8381694cb996317a740625ad/scripts/run_deepvariant.py#L367

For WGS and WES models you need to enable the `insert_size` channel. Specify the following in the config:

```
withName: "DEEPVARIANT_MAKEEXAMPLES" {
ext.args = '--channels "insert_size"'
}
```

## callvariants

It is mandatory to specify a model type. The models are available on the container filesystem in
`/opt/models` - specify the one you want with the `--checkpoint` argument.

```
withName: "DEEPVARIANT_CALLVARIANTS" {
ext.args = '--checkpoint "/opt/models/wgs'
}
```

The channels specified in the `makeexamples` process must match the model used for calling.
58 changes: 58 additions & 0 deletions modules/nf-core/deepvariant/callvariants/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@

process DEEPVARIANT_CALLVARIANTS {
tag "$meta.id"
label 'process_high'

//Conda is not supported at the moment
container "nf-core/deepvariant:1.6.1"

input:
tuple val(meta), path(make_examples_tfrecords)

output:
tuple val(meta), path("${prefix}.call-*-of-*.tfrecord.gz"), emit: call_variants_tfrecords
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script:
// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
error "DEEPVARIANT module does not support Conda. Please use Docker / Singularity / Podman instead."
}
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

def matcher = make_examples_tfrecords[0].baseName =~ /^(.+)-\d{5}-of-(\d{5})$/
if (!matcher.matches()) {
throw new IllegalArgumentException("tfrecord baseName '" + make_examples_tfrecords[0].baseName + "' doesn't match the expected pattern")
}
def examples_tfrecord_name = matcher[0][1]
def shardCount = matcher[0][2]
// Reconstruct the logical name - ${tfrecord_name}.examples.tfrecord@${task.cpus}.gz
def examples_tfrecords_logical_name = "${examples_tfrecord_name}@${shardCount}.gz"

"""
/opt/deepvariant/bin/call_variants \\
${args} \\
--outfile "${prefix}.call.tfrecord.gz" \\
--examples "${examples_tfrecords_logical_name}"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
deepvariant_callvariants: \$(echo \$(/opt/deepvariant/bin/run_deepvariant --version) | sed 's/^.*version //; s/ .*\$//' )
END_VERSIONS
"""

stub:
prefix = task.ext.prefix ?: "${meta.id}"
"""
echo "" | gzip > ${prefix}.call-00000-of-00001.tfrecord.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
deepvariant_callvariants: \$(echo \$(/opt/deepvariant/bin/run_deepvariant --version) | sed 's/^.*version //; s/ .*\$//' )
END_VERSIONS
"""
}
40 changes: 40 additions & 0 deletions modules/nf-core/deepvariant/callvariants/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: deepvariant_callvariants
description: Call variants from the examples produced by make_examples
keywords:
- variant calling
- machine learning
- neural network
tools:
- deepvariant:
description: DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
homepage: https://github.com/google/deepvariant
documentation: https://github.com/google/deepvariant
tool_dev_url: https://github.com/google/deepvariant
doi: "10.1038/nbt.4235"
licence: ["BSD-3-clause"]
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- make_examples_tfrecords:
type: file
description: The actual sharded input files, from DEEPVARIANT_MAKEEXAMPLES process
pattern: "*.gz"
output:
- call_variants_tfrecords:
type: list
description: |
Each output contains: unique ID string from input channel, meta, tfrecord file with variant calls.
- versions:
type: file
description: File containing software version
pattern: "versions.yml"
authors:
- "@abhi18av"
- "@ramprasadn"
- "@fa2k"
maintainers:
- "@abhi18av"
- "@ramprasadn"
85 changes: 85 additions & 0 deletions modules/nf-core/deepvariant/callvariants/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
nextflow_process {

name "Test Process DEEPVARIANT_CALLVARIANTS"
script "../main.nf"
config "./nextflow.config"
process "DEEPVARIANT_CALLVARIANTS"

tag "deepvariant/makeexamples"
tag "deepvariant/callvariants"
tag "deepvariant"
tag "modules"
tag "modules_nfcore"

test("homo_sapiens - wgs") {
setup {
run("DEEPVARIANT_MAKEEXAMPLES") {
script "../../makeexamples/main.nf"
process {
"""
input[0] = [
[ id:'test', single_end:false ], // meta map
file(params.modules_testdata_base_path + '/genomics/homo_sapiens/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true),
file(params.modules_testdata_base_path + '/genomics/homo_sapiens/illumina/bam/test.paired_end.sorted.bam.bai', checkIfExists: true),
[]
]
input[1] = [
[ id:'genome'],
file(params.modules_testdata_base_path + '/genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true)
]
input[2] = [
[ id:'genome'],
file(params.modules_testdata_base_path + '/genomics/homo_sapiens/genome/genome.fasta.fai', checkIfExists: true)
]
input[3] = [
[],[]
]
input[4] = [
[],[]
]
"""
}
}
}
when {
process {
"""
input[0] = DEEPVARIANT_MAKEEXAMPLES.out.examples
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert process.out.call_variants_tfrecords.get(0).get(0) == [ id:'test', single_end:false ] },
// The tfrecord binary representation is not stable, but we check the name of the output.
{ assert snapshot(file(process.out.call_variants_tfrecords.get(0).get(1)).name).match("homo_sapiens-wgs-call_variants_tfrecords-filenames")},
{ assert snapshot(process.out.versions).match("versions") },
)
}
}

test("homo_sapiens - wgs - stub") {
options "-stub"

when {
process {
"""
input[0] = [
[ id:'test', single_end:false ], // meta
[] // No input paths are needed in stub mode
]
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out).match() }
)
}
}

}
59 changes: 59 additions & 0 deletions modules/nf-core/deepvariant/callvariants/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
{
"versions": {
"content": [
[
"versions.yml:md5,5ff99ffba1e56e4e919d3dfc2d0f3cbb"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-08-09T16:38:47.927241"
},
"homo_sapiens-wgs-call_variants_tfrecords-filenames": {
"content": [
"test.call-00000-of-00001.tfrecord.gz"
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-09-04T17:04:33.276938"
},
"homo_sapiens - wgs - stub": {
"content": [
{
"0": [
[
{
"id": "test",
"single_end": false
},
"test.call-00000-of-00001.tfrecord.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
]
],
"1": [
"versions.yml:md5,5ff99ffba1e56e4e919d3dfc2d0f3cbb"
],
"call_variants_tfrecords": [
[
{
"id": "test",
"single_end": false
},
"test.call-00000-of-00001.tfrecord.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
]
],
"versions": [
"versions.yml:md5,5ff99ffba1e56e4e919d3dfc2d0f3cbb"
]
}
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-08-13T21:07:17.335788301"
}
}
11 changes: 11 additions & 0 deletions modules/nf-core/deepvariant/callvariants/tests/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
process {
withName: "DEEPVARIANT_CALLVARIANTS" {
ext.args = '--checkpoint "/opt/models/wgs"'
cpus = 2 // Keep CPUs fixed so the number of output files is reproducible
}
}
process {
withName: "DEEPVARIANT_MAKEEXAMPLES" {
ext.args = '--channels "insert_size"'
}
}
2 changes: 2 additions & 0 deletions modules/nf-core/deepvariant/callvariants/tests/tags.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
deepvariant/callvariants:
- modules/nf-core/deepvariant/callvariants/**
Loading

0 comments on commit a004c86

Please sign in to comment.