Skip to content

Latest commit

 

History

History
297 lines (255 loc) · 21.3 KB

CHANGELOG.md

File metadata and controls

297 lines (255 loc) · 21.3 KB

nf-core/viralrecon: Changelog

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[2.2] - 2021-07-29

Enhancements & fixes

  • Updated pipeline template to nf-core/tools 2.1
  • Remove custom content to render Pangolin report in MultiQC as it was officially added as a module in v1.11
  • [#212] - Access to PYCOQC.out is undefined
  • [#229] - ARTIC Guppyplex settings for 1200bp ARTIC primers with Nanopore data

Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

Dependency Old version New version
multiqc 1.10.1 1.11
pangolin 3.0.5 3.1.7
samtools 1.10 1.12

NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if new version information isn't present.

[2.1] - 2021-06-15

Enhancements & fixes

  • Removed workflow to download data from public databases in favour of using nf-core/fetchngs
  • Added Pangolin results to MultiQC report
  • Added warning to MultiQC report for samples that have no reads after adapter trimming
  • Added docs about structure of data required for running Nanopore data
  • Added docs about using other primer sets for Illumina data
  • Added docs about overwriting default container definitions to use latest versions e.g. Pangolin
  • Dashes and spaces in sample names will be converted to underscores to avoid issues when creating the summary metrics
  • [#196] - Add mosdepth heatmap to MultiQC report
  • [#197] - Output a .tsv comprising the Nextclade and Pangolin results for all samples processed
  • [#198] - ASCIIGenome failing during analysis
  • [#201] - Conditional include are not expected to work
  • [#204] - Memory errors for SNP_EFF step

Parameters

Old parameter New parameter
--public_data_ids
--skip_sra_fastq_download

NB: Parameter has been updated if both old and new parameter information is present. NB: Parameter has been added if just the new parameter information is present. NB: Parameter has been removed if new parameter information isn't present.

Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

Dependency Old version New version
nextclade_js 0.14.2 0.14.4
pangolin 2.4.2 3.0.5

NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if new version information isn't present.

[2.0] - 2021-05-13

⚠️ Major enhancements

  • Pipeline has been re-implemented in Nextflow DSL2
  • All software containers are now exclusively obtained from Biocontainers
  • Updated minimum Nextflow version to v21.04.0 (see nextflow#572)
  • BCFtools and iVar will be run by default for Illumina metagenomics and amplicon data, respectively. However, this behaviour can be customised with the --callers parameter.
  • Variant graph processes to call variants relative to the reference genome directly from de novo assemblies have been deprecated and removed
  • Variant calling with Varscan 2 has been deprecated and removed due to licensing restrictions
  • New tools:
    • Pangolin for lineage analysis
    • Nextclade for clade assignment, mutation calling and consensus sequence quality checks
    • ASCIIGenome for individual variant screenshots with annotation tracks

Other enhancements & fixes

  • Illumina and Nanopore runs containing the same 48 samples sequenced on both platforms have been uploaded to the nf-core AWS account for full-sized tests on release
  • Initial implementation of a standardised samplesheet JSON schema to use with user interfaces and for validation
  • Default human --kraken2_db link has been changed from Zenodo to an AWS S3 bucket for more reliable downloads
  • Updated pipeline template to nf-core/tools 1.14
  • Optimise MultiQC configuration and input files for faster run-time on huge sample numbers
  • [#122] - Single SPAdes command to rule them all
  • [#138] - Problem masking the consensus sequence
  • [#142] - Unknown method invocation toBytes on String type
  • [#169] - ggplot2 error when generating mosdepth amplicon plot with Swift v2 primers
  • [#170] - ivar trimming of Swift libraries new offset feature
  • [#175] - MultiQC report does not include all the metrics
  • [#188] - Add and fix EditorConfig linting in entire pipeline

Parameters

Old parameter New parameter
--amplicon_bed --primer_bed
--amplicon_fasta --primer_fasta
--amplicon_left_suffix --primer_left_suffix
--amplicon_right_suffix --primer_right_suffix
--filter_dups --filter_duplicates
--skip_adapter_trimming --skip_fastp
--skip_amplicon_trimming --skip_cutadapt
--artic_minion_aligner
--artic_minion_caller
--artic_minion_medaka_model
--asciigenome_read_depth
--asciigenome_window_size
--blast_db
--enable_conda
--fast5_dir
--fastq_dir
--ivar_trim_offset
--kraken2_assembly_host_filter
--kraken2_variants_host_filter
--min_barcode_reads
--min_guppyplex_reads
--multiqc_title
--platform
--primer_set
--primer_set_version
--public_data_ids
--save_trimmed_fail
--save_unaligned
--sequencing_summary
--singularity_pull_docker_container
--skip_asciigenome
--skip_bandage
--skip_consensus
--skip_ivar_trim
--skip_nanoplot
--skip_pangolin
--skip_pycoqc
--skip_nextclade
--skip_sra_fastq_download
--spades_hmm
--spades_mode
--cut_mean_quality
--filter_unmapped
--ivar_trim_min_len
--ivar_trim_min_qual
--ivar_trim_window_width
--kraken2_use_ftp
--max_allele_freq
--min_allele_freq
--min_base_qual
--min_coverage
--min_trim_length
--minia_kmer
--mpileup_depth
--name
--qualified_quality_phred
--save_align_intermeds
--save_kraken2_fastq
--save_sra_fastq
--skip_sra
--skip_vg
--unqualified_percent_limit
--varscan2_strand_filter

NB: Parameter has been updated if both old and new parameter information is present. NB: Parameter has been added if just the new parameter information is present. NB: Parameter has been removed if new parameter information isn't present.

Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

Dependency Old version New version
artic 1.2.1
asciigenome 1.16.0
bc 1.07.1
bcftools 1.9 1.11
bedtools 2.29.2 2.30.0
bioconductor-biostrings 2.54.0 2.58.0
bioconductor-complexheatmap 2.2.0 2.6.2
blast 2.9.0 2.10.1
bowtie2 2.4.1 2.4.2
cutadapt 2.10 3.2
ivar 1.2.2 1.3.1
kraken2 2.0.9beta 2.1.1
markdown 3.2.2
minimap2 2.17
mosdepth 0.2.6 0.3.1
multiqc 1.9 1.10.1
nanoplot 1.36.1
nextclade_js 0.14.2
pangolin 2.4.2
parallel-fastq-dump 0.6.6
picard 2.23.0 2.23.9
pigz 2.3.4
plasmidid 1.6.3 1.6.4
pycoqc 2.5.2
pygments 2.6.1
pymdown-extensions 7.1
python 3.6.10 3.8.3
r-base 3.6.2 4.0.3
r-ggplot2 3.3.1 3.3.3
r-tidyr 1.1.0
requests 2.24.0
samtools 1.9 1.10
seqwish 0.4.1
snpeff 4.5covid19 5.0
spades 3.14.0 3.15.2
sra-tools 2.10.7
tabix 0.2.6
unicycler 0.4.7 0.4.8
varscan 2.4.4
vg 1.24.0

NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if new version information isn't present.

[1.1.0] - 2020-06-23

Added

  • #112 - Per-amplicon coverage plot
  • #124 - Intersect variants across callers
  • nf-core/tools#616 - Updated GitHub Actions to build Docker image and push to Docker Hub
  • Parameters:
    • --min_mapped_reads to circumvent failures for samples with low number of mapped reads
    • --varscan2_strand_filter to toggle the default Varscan 2 strand filter
    • --skip_mosdepth - skip genome-wide and amplicon coverage plot generation from mosdepth output
    • --amplicon_left_suffix - to provide left primer suffix used in name field of --amplicon_bed
    • --amplicon_right_suffix - to provide right primer suffix used in name field of --amplicon_bed
    • Unify parameter specification with COG-UK pipeline:
      • --min_allele_freq - minimum allele frequency threshold for calling variants
      • --mpileup_depth - SAMTools mpileup max per-file depth
      • --ivar_exclude_reads renamed to --ivar_trim_noprimer
      • --ivar_trim_min_len - minimum length of read to retain after primer trimming
      • --ivar_trim_min_qual - minimum quality threshold for sliding window to pass
      • --ivar_trim_window_width - width of sliding window
  • [#118] Updated GitHub Actions AWS workflow for small and full size tests.

Removed

  • --skip_qc parameter

Dependencies

  • Add mosdepth 0.2.6
  • Add bioconductor-complexheatmap 2.2.0
  • Add bioconductor-biostrings 2.54.0
  • Add r-optparse 1.6.6
  • Add r-tidyr 1.1.0
  • Add r-tidyverse 1.3.0
  • Add r-ggplot2 3.3.1
  • Add r-reshape2 1.4.4
  • Add r-viridis 0.5.1
  • Update sra-tools 2.10.3 -> 2.10.7
  • Update bowtie2 2.3.5.1 -> 2.4.1
  • Update picard 2.22.8 -> 2.23.0
  • Update minia 3.2.3 -> 3.2.4
  • Update plasmidid 1.5.2 -> 1.6.3

[1.0.0] - 2020-06-01

Initial release of nf-core/viralrecon, created with the nf-core template.

This pipeline is a re-implementation of the SARS_Cov2_consensus-nf and SARS_Cov2_assembly-nf pipelines initially developed by Sarai Varona and Sara Monzon from BU-ISCIII. Porting both of these pipelines to nf-core was an international collaboration between numerous contributors and developers, led by Harshil Patel from the The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London. We appreciated the need to have a portable, reproducible and scalable pipeline for the analysis of COVID-19 sequencing samples and so the Avengers Assembled!

Pipeline summary

  1. Download samples via SRA, ENA or GEO ids (ENA FTP, parallel-fastq-dump; if required)
  2. Merge re-sequenced FastQ files (cat; if required)
  3. Read QC (FastQC)
  4. Adapter trimming (fastp)
  5. Variant calling
    1. Read alignment (Bowtie 2)
    2. Sort and index alignments (SAMtools)
    3. Primer sequence removal (iVar; amplicon data only)
    4. Duplicate read marking (picard; removal optional)
    5. Alignment-level QC (picard, SAMtools)
    6. Choice of multiple variant calling and consensus sequence generation routes (VarScan 2, BCFTools, BEDTools || iVar variants and consensus || BCFTools, BEDTools)
  6. De novo assembly
    1. Primer trimming (Cutadapt; amplicon data only)
    2. Removal of host reads (Kraken 2)
    3. Choice of multiple assembly tools (SPAdes || metaSPAdes || Unicycler || minia)
  7. Present QC and visualisation for raw read, alignment, assembly and variant calling results (MultiQC)