Skip to content

Commit

Permalink
Merge pull request #362 from genomic-medicine-sweden/gatkcnvcaller
Browse files Browse the repository at this point in the history
Gatkcnvcaller
  • Loading branch information
ramprasadn authored Jul 7, 2023
2 parents 38e01e0 + 007c5ee commit 06a9317
Show file tree
Hide file tree
Showing 26 changed files with 1,006 additions and 39 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- Add GATK's cnv calling pipeline [#362](https://github.com/nf-core/raredisease/pull/362)
- Add `public_aws_ecr` profile for using AWS ECR public gallery images [#360](https://github.com/nf-core/raredisease/pull/360)
- GATK's ShiftFasta to generate all the files required for mitochondrial analysis [#354](https://github.com/nf-core/raredisease/pull/354)
- Feature to calculate CADD scores for indels [#325](https://github.com/nf-core/raredisease/pull/325)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ On release, automated continuous integration tests run the pipeline on a full-si

- [Manta](https://github.com/Illumina/manta)
- [TIDDIT's sv](https://github.com/SciLifeLab/TIDDIT)
- Copy number variant calling:
- [GATK GermlineCNVCaller](https://github.com/broadinstitute/gatk)

**5. Annotation - SNV:**

Expand Down Expand Up @@ -153,8 +155,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->

If you use nf-core/raredisease for your analysis, please cite it using the following doi: [10.5281/zenodo.7995798](https://doi.org/10.5281/zenodo.7995798)

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
Expand Down
39 changes: 39 additions & 0 deletions conf/modules/call_sv_germlinecnvcaller.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = Conditional clause
----------------------------------------------------------------------------------------
*/

//
// gcnvcaller calling options
//

process {

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER.*" {
publishDir = [
enabled: false
]
ext.when = !params.skip_cnv_calling
}

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER:GATK4_COLLECTREADCOUNTS" {
ext.args = "--format TSV --interval-merging-rule OVERLAPPING_ONLY"
}

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER:GATK4_DETERMINEGERMLINECONTIGPLOIDY" {
ext.prefix = { "${meta.id}_ploidy" }
}

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER:GATK4_GERMLINECNVCALLER" {
ext.args = "--run-mode CASE"
ext.prefix = { "${meta.id}_${model.simpleName}" }
}
}
11 changes: 11 additions & 0 deletions conf/modules/prepare_references.config
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,15 @@ process {
enabled: false
]
}

withName: '.*PREPARE_REFERENCES:GATK_PREPROCESS_WGS' {
ext.args = { "--padding 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }
ext.when = { params.analysis_type.equals("wgs") && !params.readcount_intervals }
}

withName: '.*PREPARE_REFERENCES:GATK_PREPROCESS_WES' {
ext.args = { "--bin-length 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }
ext.when = { params.analysis_type.equals("wes") && !params.readcount_intervals }
}

}
3 changes: 3 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ params {
igenomes_ignore = true
mito_name = 'MT'

// analysis params
skip_cnv_calling = true

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/testdata/samplesheet_trio.csv'

Expand Down
3 changes: 3 additions & 0 deletions conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ params {
igenomes_ignore = true
mito_name = 'MT'

// analysis params
skip_cnv_calling = true

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/testdata/samplesheet_single.csv'

Expand Down
7 changes: 6 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Variant calling - SV](#variant-calling---sv)
- [Manta](#manta)
- [TIDDIT sv](#tiddit-sv)
- [GATK GermlineCNVCaller - CNV calling](#gatk-germlinecnvcaller---cnv-calling)
- [SVDB merge](#svdb-merge)
- [Variant calling - repeat expansions](#variant-calling---repeat-expansions)
- [Expansion Hunter](#expansion-hunter)
Expand Down Expand Up @@ -252,9 +253,13 @@ The pipeline performs variant calling using [Sentieon DNAscope](https://support.

[TIDDIT's sv](https://github.com/SciLifeLab/TIDDIT) is used to identify chromosomal rearrangements using sequencing data. TIDDIT identifies intra and inter-chromosomal translocations, deletions, tandem-duplications and inversions, using supplementary alignments as well as discordant pairs. TIDDIT searches for discordant reads and split reads (supplementary alignments). Output vcf files are treated as intermediates and are not placed in the output folder by default.

#### GATK GermlineCNVCaller - CNV calling

[GATK GermlineCNVCaller](https://github.com/broadinstitute/gatk) is used to identify copy number variants in germline samples given their read counts and a model describing a sample's ploidy. Output vcf files are treated as intermediates and are not placed in the output folder by default.

#### SVDB merge

[SVDB merge](https://github.com/J35P312/SVDB#merge) is used to merge the variant calls from both Manta and TIDDIT. Output files are published in the output folder.
[SVDB merge](https://github.com/J35P312/SVDB#merge) is used to merge the variant calls from GATK's GermlineCNVCaller (only if skip_cnv_calling is set to false), Manta, and TIDDIT. Output files are published in the output folder.

<details markdown="1">
<summary>Output files</summary>
Expand Down
24 changes: 18 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ Table of contents:
- [3. Repeat expansions](#3-repeat-expansions)
- [4. Variant calling - SNV](#4-variant-calling---snv)
- [5. Variant calling - Structural variants](#5-variant-calling---structural-variants)
- [6. SNV annotation & Ranking](#6-snv-annotation--ranking)
- [7. SV annotation & Ranking](#7-sv-annotation--ranking)
- [8. Mitochondrial analysis](#8-mitochondrial-analysis)
- [6. Copy number variant calling](#6-copy-number-variant-calling)
- [7. SNV annotation & Ranking](#7-snv-annotation--ranking)
- [8. SV annotation & Ranking](#8-sv-annotation--ranking)
- [9. Mitochondrial analysis](#9-mitochondrial-analysis)
- [Run the pipeline](#run-the-pipeline)
- [Direct input in CLI](#direct-input-in-cli)
- [Import from a config file (recommended)](#import-from-a-config-file-recommended)
Expand Down Expand Up @@ -188,7 +189,18 @@ The mandatory and optional parameters for each category are tabulated below.
| | target_bed |
| | bwa |

##### 6. SNV annotation & Ranking
##### 6. Copy number variant calling

| Mandatory | Optional |
| ------------------------------ | ------------------------------- |
| ploidy_model<sup>1</sup> | readcount_intervals<sup>3</sup> |
| gcnvcaller_model<sup>1,2</sup> | |

<sup>1</sup> Output from steps 3 & 4 of GATK's CNV calling pipeline run in cohort mode as described [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants).<br />
<sup>2</sup> Sample file can be found [here](https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gcnvmodels.tsv) (Note the header 'models' in the sample file).<br />
<sup>3</sup> Output from step 1 of GATK's CNV calling pipeline as described [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants).<br />

##### 7. SNV annotation & Ranking

| Mandatory | Optional |
| ----------------------------- | ------------------------------ |
Expand All @@ -215,7 +227,7 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl

> NB: We use CADD only to annotate small indels. To annotate SNVs with precomputed CADD scores, pass the file containing CADD scores as a resource to vcfanno instead. Files containing the precomputed CADD scores for SNVs can be downloaded from [here](https://cadd.gs.washington.edu/download) (description: "All possible SNVs of GRCh3<7/8>/hg3<7/8>")
##### 7. SV annotation & Ranking
##### 8. SV annotation & Ranking

| Mandatory | Optional |
| -------------------------- | ------------------ |
Expand All @@ -227,7 +239,7 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl

<sup>1</sup> A CSV file that describes the databases (VCFs) used by SVDB for annotating structural variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/svdb_querydb_files.csv). Information about the column headers can be found [here](https://github.com/J35P312/SVDB#Query).

##### 8. Mitochondrial analysis
##### 9. Mitochondrial analysis

| Mandatory | Optional |
| ----------------- | -------- |
Expand Down
11 changes: 7 additions & 4 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ params.bwa = WorkflowMain.getGenomeAttribute(params,
params.bwamem2 = WorkflowMain.getGenomeAttribute(params, 'bwamem2')
params.call_interval = WorkflowMain.getGenomeAttribute(params, 'call_interval')
params.cadd_resources = WorkflowMain.getGenomeAttribute(params, 'cadd_resources')
params.gcnvcaller_model = WorkflowMain.getGenomeAttribute(params, 'gcnvcaller_model')
params.gens_interval_list = WorkflowMain.getGenomeAttribute(params, 'gens_interval_list')
params.gens_pon = WorkflowMain.getGenomeAttribute(params, 'gens_pon')
params.gens_gnomad_pos = WorkflowMain.getGenomeAttribute(params, 'gens_gnomad_pos')
params.gnomad_af = WorkflowMain.getGenomeAttribute(params, 'gnomad_af')
params.gnomad_af_idx = WorkflowMain.getGenomeAttribute(params, 'gnomad_af_idx')
params.intervals_wgs = WorkflowMain.getGenomeAttribute(params, 'intervals_wgs')
Expand All @@ -33,22 +37,21 @@ params.known_indels = WorkflowMain.getGenomeAttribute(params,
params.known_mills = WorkflowMain.getGenomeAttribute(params, 'known_mills')
params.ml_model = WorkflowMain.getGenomeAttribute(params, 'ml_model')
params.mt_fasta = WorkflowMain.getGenomeAttribute(params, 'mt_fasta')
params.ploidy_model = WorkflowMain.getGenomeAttribute(params, 'ploidy_model')
params.reduced_penetrance = WorkflowMain.getGenomeAttribute(params, 'reduced_penetrance')
params.readcount_intervals = WorkflowMain.getGenomeAttribute(params, 'readcount_intervals')
params.sequence_dictionary = WorkflowMain.getGenomeAttribute(params, 'sequence_dictionary')
params.score_config_snv = WorkflowMain.getGenomeAttribute(params, 'score_config_snv')
params.score_config_sv = WorkflowMain.getGenomeAttribute(params, 'score_config_sv')
params.target_bed = WorkflowMain.getGenomeAttribute(params, 'target_bed')
params.svdb_query_dbs = WorkflowMain.getGenomeAttribute(params, 'svdb_query_dbs')
params.target_bed = WorkflowMain.getGenomeAttribute(params, 'target_bed')
params.variant_catalog = WorkflowMain.getGenomeAttribute(params, 'variant_catalog')
params.vep_filters = WorkflowMain.getGenomeAttribute(params, 'vep_filters')
params.vcfanno_resources = WorkflowMain.getGenomeAttribute(params, 'vcfanno_resources')
params.vcfanno_toml = WorkflowMain.getGenomeAttribute(params, 'vcfanno_toml')
params.vcfanno_lua = WorkflowMain.getGenomeAttribute(params, 'vcfanno_lua')
params.vep_cache = WorkflowMain.getGenomeAttribute(params, 'vep_cache')
params.vep_cache_version = WorkflowMain.getGenomeAttribute(params, 'vep_cache_version')
params.gens_interval_list = WorkflowMain.getGenomeAttribute(params, 'gens_interval_list')
params.gens_pon = WorkflowMain.getGenomeAttribute(params, 'gens_pon')
params.gens_gnomad_pos = WorkflowMain.getGenomeAttribute(params, 'gens_gnomad_pos')

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
25 changes: 25 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -105,16 +105,31 @@
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"installed_by": ["modules"]
},
"gatk4/collectreadcounts": {
"branch": "master",
"git_sha": "d25bf48327e86a7f737047a57ec264b90e22ce3d",
"installed_by": ["modules"]
},
"gatk4/createsequencedictionary": {
"branch": "master",
"git_sha": "541811d779026c5d395925895fa5ed35e7216cc0",
"installed_by": ["modules"]
},
"gatk4/determinegermlinecontigploidy": {
"branch": "master",
"git_sha": "d25bf48327e86a7f737047a57ec264b90e22ce3d",
"installed_by": ["modules"]
},
"gatk4/filtermutectcalls": {
"branch": "master",
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"installed_by": ["modules"]
},
"gatk4/germlinecnvcaller": {
"branch": "master",
"git_sha": "f6b848c6e1af9a9ecf4975aa8c8edad05e75e784",
"installed_by": ["modules"]
},
"gatk4/intervallisttools": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand All @@ -135,6 +150,16 @@
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"installed_by": ["modules"]
},
"gatk4/postprocessgermlinecnvcalls": {
"branch": "master",
"git_sha": "39ca55cc30514169f8420162bafe4ecf673f4b9a",
"installed_by": ["modules"]
},
"gatk4/preprocessintervals": {
"branch": "master",
"git_sha": "1226419498a14d17f98d12d6488d333b0dbd0418",
"installed_by": ["modules"]
},
"gatk4/printreads": {
"branch": "master",
"git_sha": "541811d779026c5d395925895fa5ed35e7216cc0",
Expand Down
68 changes: 68 additions & 0 deletions modules/nf-core/gatk4/collectreadcounts/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 06a9317

Please sign in to comment.