Skip to content

Installation

Colin Davenport edited this page Jun 19, 2024 · 24 revisions

Installation

  • We only recommend using Wochenende on Linux 64 bit (typical on servers). We have not tested on Macs due to lack of hardware, though to our knowledge most servers aren't running Mac OS.
  • We recommend using Bioconda for installation of the tools used by our pipeline.
  • Expect to use at least 40-50 GB RAM per sample fastq being processed (mainly for bwa mem or minimap2 alignment)
  • We have used the tool on SLURM and LSF clusters using Ubuntu 18.04 and 20.04.

1. First steps - install miniconda

If not already installed, install miniconda 64bit from here

2. Install Mamba (much faster replacement for conda)

Installing Mamba should be as simple as conda install mamba -n base -c conda-forge

3. Get the Wochenende software from Github

Either use git clone:

git clone https://github.com/MHH-RCUG/nf_wochenende.git

or download the zip

wget https://github.com/MHH-RCUG/nf_wochenende/archive/refs/heads/main.zip

4. Prepare your Wochenende conda environment

conda activate base

mamba env create --file env.wochenende.minimal.yml

Now most software should be installed.

Test if a package can be found

conda activate wochenende

samtools

5 Edit paths in nextflow.config

Edit the paths to Wochenende (where you downloaded it on your system) and optionally haybaler in nextflow.config

6 Get a reference sequence containing human and many bacterial sequences.

Download a reference sequence from our preferred site: https://drive.google.com/drive/folders/1q1btJCxtU15XXqfA-iCyNwgKgQq0SrG4?usp=sharing or create your own.

If you work with clinical data, ie. where the main contaminant is human, that you should NOT remove the human from this sequence, since this will lead to false positive assignments of human reads to various bacteria. Reads are short, and the aligner will try to assign the human reads to bacteria (with mismatches) which can cause problems. If you work with other contaminants, eg mouse associated metagenomes, then you will need to remove the human and add the mouse genome to the supplied reference sequence (see the Wiki section on Building a reference sequence).

a) If you want to use bwa mem as aligner (recommended for short reads), you'll need to create an index of that fasta reference sequence as usual, eg. where x is the name of the reference fasta.

gunzip x.fa && bwa index x.fa x.fa &.

Minimap2 works with fasta directly without this step.

7 Edit paths to reference directories, adapters and tools in config.yaml

Important! Edit the configuration section of config.yaml to set the paths to the tools, temp directory and reference sequences. Use a code editor to avoid breaking the yaml format.

8 Install haybaler and raspir

Optional but recommended. Haybaler and raspir

Haybaler is a great subprogram for aggregating Wochenende reporting outputs from multiple samples into one TSV per output type, so you can easily compare. Haybaler also generates heatmaps and figures based upon these outputs. Raspir, a clever mathematical approach for finding species which are present based on their read distributions, (https://github.com/mmpust/raspir) is supplied with Wochenende in a subfolder, and uses the same conda environmen as Haybaler. Therefore no extra installation is required beyond this step.

Go out of the Wochenende folder to install Haybaler

# first clone haybaler
git clone https://github.com/MHH-RCUG/haybaler
cd haybaler
# Set up a dedicated environment for haybaler using mamba
mamba env create -f env.haybaler.yml
conda activate haybaler

9 Prepare an R server if available for visualizing Haybaler output (optional)

This optional step needs a server running R with heatmap and metacoder packages installed. This is only for automated visualization, so if you prefer to do your own visualization or don't have an R server in your cluster you can skip this step.

See install instructions at https://github.com/MHH-RCUG/haybaler/tree/dev#installing-r-packages-for-heatmaps-and-heat-trees-eg-in-rstudio

x Install ABRA (optional, just a jar download)

This step is suspended - not required - at present in 2022 for security reasons. ABRA2 still contains the rather nasty log4j security issue at the time of writing (12-2021 to 06-2022).

Install the ABRA realigner (you can skip this for now if you don't want to do genomic realignment, which is not useful for metagenomes unless you intend to look for SNPs)

ABRA2