Merge branch 'master' into TEMPLATE

seqeralabs · May 2, 2024 · 580901a · 580901a
2 parents 4c9e36b + b3b9239
commit 580901a
Show file tree

Hide file tree

Showing 16 changed files with 786 additions and 93 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,8 @@ Initial release of seqeralabs/nf-dragen, created with the [nf-core](https://nf-c
 
 ### `Fixed`
 
+- Fixed error `Access to 'FASTQC.out' is undefined since the workflow 'FASTQC' has not been invoked before accessing the output attribute` when `-skip_fastqc` enabled by adjusting channel generation
+
 ### `Dependencies`
 
 ### `Deprecated`
diff --git a/LICENSE b/LICENSE
diff --git a/LICENSE.txt b/LICENSE.txt
diff --git a/README.md b/README.md
@@ -1,86 +1,103 @@
-[![GitHub Actions CI Status](https://github.com/seqeralabs/nf-dragen/actions/workflows/ci.yml/badge.svg)](https://github.com/seqeralabs/nf-dragen/actions/workflows/ci.yml)
-[![GitHub Actions Linting Status](https://github.com/seqeralabs/nf-dragen/actions/workflows/linting.yml/badge.svg)](https://github.com/seqeralabs/nf-dragen/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
-[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
 
-[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
-[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
-[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
-[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
-[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/seqeralabs/nf-dragen)
+[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.10.3-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)
+
+> THIS IS A PROOF-OF-CONCEPT REPOSITORY THAT IS UNDER ACTIVE DEVELOPMENT. SYNTAX, ORGANISATION AND LAYOUT MAY CHANGE WITHOUT NOTICE!
 
 ## Introduction
 
-**seqeralabs/nf-dragen** is a bioinformatics pipeline that ...
+**nf-dragen** is a simple, proof-of-concept pipeline to run the [Illumina DRAGEN](https://emea.illumina.com/products/by-type/informatics-products/dragen-bio-it-platform.html) licensed suite of tools.
 
-<!-- TODO nf-core:
-   Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
-   major pipeline sections and the types of output it produces. You're giving an overview to someone new
-   to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
--->
+The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker  containers making installation trivial and results highly reproducible. This pipeline has only been tested on AWS Batch.
 
-<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
-     workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples.   -->
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
+## Integration with Nextflow Tower
 
-1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
+We have streamlined the process of deploying Nextflow workflows that utilise Illumina DRAGEN on AWS Batch via Tower.
 
-## Usage
+### Prerequisites
 
-> [!NOTE]
-> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
+#### Credentials
 
-<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
-     Explain what rows and columns represent. For instance (please edit as appropriate):
+You will need to obtain the following information from the Illumina DRAGEN team:
 
-First, prepare a samplesheet with your input data that looks as follows:
+1. Private AMI id in an AWS region with DRAGEN F1 instance availability
+2. Username to run DRAGEN on the CLI via Nextflow
+3. Password to run DRAGEN on the CLI via Nextflow
 
-`samplesheet.csv`:
+#### Pipeline implementation
 
-```csv
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
-```
+Please see the [dragen.nf](modules/local/dragen.nf) module implemented in this pipeline for reference.
 
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
+Any Nextflow processes calling the `dragen` command must have:
 
--->
+1. `label dragen` ([see docs](https://www.nextflow.io/docs/latest/process.html?highlight=label#label)). This is how Tower will determine which processes need to be specifically executed on DRAGEN F1 instances.
 
-Now, you can run the pipeline using:
+    ```nextflow
+    process DRAGEN {
+        label 'dragen'
 
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
+        <truncated>
+    }
+    ```
 
-```bash
-nextflow run seqeralabs/nf-dragen \
-   -profile <docker/singularity/.../institute> \
-   --input samplesheet.csv \
-   --outdir <OUTDIR>
-```
+2. `secret DRAGEN_USERNAME` and `secret DRAGEN_PASSWORD` ([see docs](https://www.nextflow.io/docs/latest/secrets.html?highlight=secrets#secrets)). These Secrets will be provided securely to the `--lic-server` option when running DRAGEN on the CLI to validate the license.
 
-> [!WARNING]
-> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
-> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
+    ```nextflow
+    process DRAGEN {
+        secret 'DRAGEN_USERNAME'
+        secret 'DRAGEN_PASSWORD'
 
-## Credits
+        <truncated>
 
-seqeralabs/nf-dragen was originally written by Harshil Patel, Graham Wright.
+        script:
+        """
+        /opt/edico/bin/dragen \\
+                --lic-server=\$DRAGEN_USERNAME:\[email protected] \\
+                <other_options>
+        """
+    }
+    ```
 
-We thank the following people for their extensive assistance in the development of this pipeline:
+### Compute Environment
 
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
+You can use Tower Forge to automatically create a separate AWS Batch queue with dedicated F1 instances to run DRAGEN. 
 
-## Contributions and Support
+In the Tower UI, go to `Compute Environments` -> `Add Compute Environment` and fill in the appropriate settings for your AWS Batch environment. Additionally, you will be able to paste your private DRAGEN AMI id as shown in the image below:
 
-If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
+![Tower enable DRAGEN](docs/images/tower_ce_enable_dragen.png)
 
-## Citations
+Click on `Add` to create the Compute Environment.
+
+> Please ensure that the `Region` you select contains DRAGEN F1 instances.
+
+### Secrets
+
+As outlined in [this blog](https://seqera.io/blog/pipeline-secrets-secure-handling-of-sensitive-information-in-tower/) you can add Secrets to Tower to safely encrypt the username and password information required to run DRAGEN via Nextflow.
+
+In the Tower UI, go to `Secrets` -> `Add Pipeline Secret` and add both of the Secrets as shown in the images below:
+
+1. `DRAGEN_USERNAME`
+
+![Tower Secrets DRAGEN username](docs/images/tower_secrets_dragen_username.png)
 
-<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
-<!-- If you use seqeralabs/nf-dragen for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
+2. `DRAGEN_PASSWORD`
 
-<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
+![Tower Secrets DRAGEN password](docs/images/tower_secrets_dragen_password.png)
+
+### Pipeline
+
+In the Tower UI, go to `Launchpad` -> `Add Pipeline`. Fill in the appropriate details to add your pipeline and ensure that the Compute Environment and Secrets you created previously are both defined for use by the pipeline:
+
+![Tower Pipeline Secrets](docs/images/tower_pipeline_secrets.png)
+
+Click on `Add` to create the pipeline and launch it when you are ready!
+
+## Credits
+
+nf-dragen was originally written by [Harshil Patel](https://github.com/drpatelh) and [Graham Wright](https://github.com/gwright99) and [Paolo Di Tommasso](https://github.com/pditommaso), [Seqera Labs](https://seqera.io/).
+
+## Citations
 
-An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
+The nf-core pipeline template was used to create the skeleton of this pipeline but there are no plans to contribute it to nf-core at this point.
 
 This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).
 

diff --git a/conf/modules.config b/conf/modules.config
@@ -11,7 +11,6 @@
 */
 
 process {
-
     publishDir = [
         path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
         mode: params.publish_dir_mode,
@@ -30,13 +29,48 @@ process {
         ]
     }
 
-    withName: 'MULTIQC' {
-        ext.args   = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
+    withName: 'DRAGEN_BUILDHASHTABLE_DNA' {
+        ext.prefix = 'dragen_index_dna'
         publishDir = [
-            path: { "${params.outdir}/multiqc" },
-            mode: params.publish_dir_mode,
+            path: { "${params.outdir}/genome/index" },
+            mode: 'copy',
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
     }
 
-}
+    withName: 'DRAGEN_BUILDHASHTABLE_RNA' {
+        ext.args = '--ht-build-rna-hashtable true'
+        ext.prefix = 'dragen_index_rna'
+        publishDir = [
+            path: { "${params.outdir}/genome/index" },
+            mode: 'copy',
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'DRAGEN_FASTQ_TO_BAM_DNA' {
+        publishDir = [
+            path: { "${params.outdir}/dragen/dna_fastq_to_bam" },
+            mode: 'copy',
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'DRAGEN_FASTQ_TO_VCF_DNA' {
+        ext.args = '--enable-variant-caller true'
+        publishDir = [
+            path: { "${params.outdir}/dragen/dna_fastq_to_vcf" },
+            mode: 'copy',
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'DRAGEN_FASTQ_TO_BAM_RNA' {
+        ext.args = '--enable-rna true'
+        publishDir = [
+            path: { "${params.outdir}/dragen/rna_fastq_to_bam" },
+            mode: 'copy',
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+}
diff --git a/conf/test.config b/conf/test.config
@@ -16,14 +16,12 @@ params {
 
     // Limit resources so that this can run on GitHub Actions
     max_cpus   = 2
-    max_memory = '6.GB'
+    max_memory = '12.GB'
     max_time   = '6.h'
 
     // Input data
-    // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
-    // TODO nf-core: Give any required params for the test so that command line flags are not needed
-    input  = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
+    input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
 
     // Genome references
-    genome = 'R64-1-1'
+    fasta = 'https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/primer_schemes/artic/nCoV-2019/V1200/nCoV-2019.reference.fasta'
 }
diff --git a/docs/images/tower_ce_enable_dragen.png b/docs/images/tower_ce_enable_dragen.png
diff --git a/docs/images/tower_pipeline_secrets.png b/docs/images/tower_pipeline_secrets.png
diff --git a/docs/images/tower_secrets_dragen_password.png b/docs/images/tower_secrets_dragen_password.png
diff --git a/docs/images/tower_secrets_dragen_username.png b/docs/images/tower_secrets_dragen_username.png
diff --git a/main.nf b/main.nf
@@ -27,10 +27,8 @@ include { getGenomeAttribute      } from './subworkflows/local/utils_nfcore_nf-d
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 */
 
-// TODO nf-core: Remove this line if you don't need a FASTA file
-//   This is an example of how to use getGenomeAttribute() to fetch parameters
-//   from igenomes.config using `--genome`
-params.fasta = getGenomeAttribute('fasta')
+params.fasta        = getGenomeAttribute('fasta')
+params.dragen_index = getGenomeAttribute('dragen')
 
 /*
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/modules/local/dragen.nf b/modules/local/dragen.nf
@@ -0,0 +1,55 @@
+process DRAGEN {
+    tag "$meta.id"
+    label 'dragen'
+
+    secret 'DRAGEN_USERNAME'
+    secret 'DRAGEN_PASSWORD'
+
+    input:
+    tuple val(meta), path(files_in)
+    path index
+
+    output:
+    tuple val(meta), path('*.bam')                             , emit: bam         , optional:true
+    tuple val(meta), path('*fastq.gz')                         , emit: fastq       , optional:true
+    tuple val(meta), path("${prefix}.vcf.gz")                  , emit: vcf         , optional:true
+    tuple val(meta), path("${prefix}.vcf.gz.tbi")              , emit: tbi         , optional:true
+    tuple val(meta), path("${prefix}.hard-filtered.vcf.gz")    , emit: vcf_filtered, optional:true
+    tuple val(meta), path("${prefix}.hard-filtered.vcf.gz.tbi"), emit: tbi_filtered, optional:true
+    path  "versions.yml"                                       , emit: versions
+
+    script:
+    def args = task.ext.args ?: ''
+    prefix = task.ext.prefix ?: "${meta.id}"
+
+    def ref = index ? "-r $index" : ''
+
+    // Generate appropriate parameter for input files
+    def input = ''
+    def rgid = ''
+    def rgdm = ''
+    def file_list = files_in.collect { it.toString() }
+    if (file_list[0].endsWith('.bam')) {
+        input = "-b ${files_in}"
+    } else {
+        input = meta.single_end ? "-1 ${files_in}" : "-1 ${files_in[0]} -2 ${files_in[1]}"
+        rgid = meta.rgid ? "--RGID ${meta.rgid}" : "--RGID ${meta.id}"
+        rgsm = meta.rgsm ? "--RGSM ${meta.rgsm}" : "--RGSM ${meta.id}"
+    }
+    """
+    /opt/edico/bin/dragen \\
+        $ref \\
+        --output-directory ./ \\
+        --output-file-prefix $prefix \\
+        --lic-server=\$DRAGEN_USERNAME:\$[email protected] \\
+        $input \\
+        $rgid \\
+        $rgsm \\
+        $args
+
+    cat <<-END_VERSIONS > versions.yml
+    "${task.process}":
+        dragen: \$(echo \$(/opt/edico/bin/dragen --version 2>&1) | sed -e "s/dragen Version //g")
+    END_VERSIONS
+    """
+}
diff --git a/modules/local/dragen_buildhashtable.nf b/modules/local/dragen_buildhashtable.nf
@@ -0,0 +1,33 @@
+process DRAGEN_BUILDHASHTABLE {
+    tag "$fasta"
+    label 'dragen'
+
+    secret 'DRAGEN_USERNAME'
+    secret 'DRAGEN_PASSWORD'
+
+    input:
+    path fasta
+
+    output:
+    path "$prefix"     , emit: index
+    path "versions.yml", emit: versions
+
+    script:
+    def args = task.ext.args ?: ''
+    prefix = task.ext.prefix ?: 'dragen'
+    """
+    mkdir -p $prefix
+
+    /opt/edico/bin/dragen \\
+        --build-hash-table true \\
+        --output-directory $prefix \\
+        --ht-reference $fasta \\
+        --lic-server=\$DRAGEN_USERNAME:\$[email protected] \\
+        $args
+
+    cat <<-END_VERSIONS > versions.yml
+    "${task.process}":
+        dragen: \$(echo \$(/opt/edico/bin/dragen --version 2>&1) | sed -e "s/dragen Version //g")
+    END_VERSIONS
+    """
+}
diff --git a/nextflow.config b/nextflow.config
@@ -9,6 +9,10 @@
 // Global default params, used in configs
 params {
 
+    // Pipeline options
+    skip_fastqc                = false
+    skip_dragen                = false
+
     // TODO nf-core: Specify your pipeline's command line flags
     // Input options
     input                      = null
@@ -226,9 +230,9 @@ dag {
 
 manifest {
     name            = 'seqeralabs/nf-dragen'
-    author          = """Harshil Patel, Graham Wright"""
+    author          = 'Harshil Patel, Graham Wright'
     homePage        = 'https://github.com/seqeralabs/nf-dragen'
-    description     = """Nextflow pipeline to run Illumina DRAGEN software"""
+    description     = 'Nextflow pipeline to run Illumina DRAGEN software'
     mainScript      = 'main.nf'
     nextflowVersion = '!>=23.04.0'
     version         = '1.0dev'