A set of Nextflow modules commonly used across pipelines.
Module for deleting intermediate files from disk as they're no longer needed by downstream processes. Symbolic links are followed to the actual file and both are deleted.
Tools used: GNU rm
and readlink
.
Inputs:
file_to_remove
: path to file to be deletedready_for_deletion_signal
: val to indicate that file is no longer needed by any processes
Parameters:
output_dir
: directory for storing outputslog_output_dir
: directory for storing log filessave_intermediate_files
: boolean indicating whether this process should run (disable when intermediate files need to be kept)docker_image
: docker image within which process will run. The default is:ghcr.io/uclahs-cds/pipeval:3.0.0
process_label
: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- Add this repository as a submodule in the pipeline of interest
- Include the
remove_intermediate_files
process from the modulemain.nf
with a relative path - Use the
addParams
directive when importing to specify any params - Call the process with the inputs where needed
Module for extracting the genome intervals from a reference genome dictionary.
Tools used: GNU grep
, cut
, and sed
.
Inputs:
- reference_dict: path to reference genome dictionary
Parameters:
output_dir
: directory for storing outputslog_output_dir
: directory for storing log filessave_intermediate_files
: boolean indicating whether the extracted intervals should be copied to the output directorydocker_image
: docker image within which process will run. The default is:ghcr.io/uclahs-cds/pipeval:3.0.0
process_label
: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- Add this repository as a submodule in the pipeline of interest
- Include the
extract_GenomeIntervals
process from the modulemain.nf
with a relative path - Use the
addParams
directive when importing to specify any params - Call the process with the inputs where needed
Module containing function to take components of a filename and combine them in a standardized format, returned as a string.
Tools used: Groovy functions
Inputs:
main_tool
: string containing name and version of main tool used for generating filedataset_id
: string identifying dataset the file belongs tosample_id
: string identifying the same contained in the fileadditional_args
: Map containing additional optional arguments. Available args:additional_tools
: list of strings identifying any additional tools to include in filenameadditional_information
: string containing any additional information to be included at the end of the filename
Additional functions:
sanitize_string
- Pass input string to sanitize, keeping only alphanumeric,-
,/
, and.
characters and replacing_
with-
- Inputs:
raw
: string to sanitize
- Inputs:
Outputs:
- String representing the standardized filename
- Add this repository as a submodule in the pipeline of interest
- Include the
generate_standard_filename
and any additional necessary functions from the modulemain.nf
with a relative path in any Nextflow file requiring use of the function - Call the functions as needed with the approriate inputs and use returned value to set file names
Module for validating files and directories using PipeVal. There are two nearly-identical methods in this module: run_validate_PipeVal
and run_validate_PipeVal_with_metadata
.
Tools used: PipeVal
.
Inputs:
file_to_validate
: path for file or directory to validate
Inputs:
run_validate_PipeVal
:file_to_validate
: path for file to generate a checksum
run_validate_PipeVal_with_metadata
Inputs:- A tuple of:
file_to_validate
: path for file to generate a checksummetadata
: arbitraryval
passed through to the output
- A tuple of:
Parameters:
log_output_dir
: directory for storing log filesdocker_image_version
: PipeVal docker image version within which process will run. The default is:4.0.0-rc.2
process_label
: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config filesmain_process
: Set output directory to the specified main process instead ofPipeVal-4.0.0-rc.2
Outputs:
validation_result
: path of file with validation output textvalidated_file
:file_to_validate
or tuple of (file_to_validate
,metadata
)
- Add this repository as a submodule in the pipeline of interest
- Include the
run_validate_PipeVal
orrun_validate_PipeVal_with_metadata
process from the modulemain.nf
with a relative path - Use the
addParams
directive when importing to specify any params - Call the process with the inputs where needed
- Aggregate and save the output validation files as needed
Module for generating checksums for files using PipeVal
Tools used: PipeVal
.
Inputs:
input_file
: path for file to generate a checksum
Parameters:
output_dir
: directory for storing checksumslog_output_dir
: directory for storing log filesdocker_image_version
: PipeVal docker image version within which process will run. The default is:4.0.0-rc.2
process_label
: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config filesmain_process
: Set output directory to the specified main process instead ofPipeVal-4.0.0-rc.2
checksum_alg
: Type of checksum to generate. Choices:sha512
(default),md5
- Add this repository as a submodule in the pipeline of interest
- Include the
generate_checksum_PipeVal
process from the modulemain.nf
with a relative path - Use the
addParams
directive when importing to specify any params - Call the process with the inputs where needed
Module for compressing and indexing VCF/GFF files, the input should be compressed or uncompressed *.vcf or *.gff files.
Tools used: tabix
, bgzip
.
Inputs:
- id: string identifying the
id
of the indexed VCF. For more than one VCFs, theid
should be unique for each sample. - file_to_index: path for VCF file to compress and index.
Parameters:
- output_dir: directory to store compressed VCF and index files.
- log_output_dir: directory to store log files.
- docker_image: SAMtools docker image version within which process will run. The default is:
1.15.1
- process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- is_output_file: determine the output of this process should be saved to
output
orintermediate
folder. Forintermediate
process, usingaddParams
to specifyis_output_file: false
. The default istrue
. - save_intermediate_files: whether the index files should be saved to the intermediate output directory.
- unzip_and_rezip: whether compressed files should be uncompressed and re-compressed using
bgzip
. The default isfalse
.
- Add this repository as a submodule in the pipeline of interest.
- Include the
compress_index_VCF
workflow from the modulemain.nf
with a relative path. - Use the
addParams
directive when importing to specify any params. - Call the process with the input channel, a tuple with
id
andfile_path
.
Module returns the expected path to the index file for a given input file. NOTE! This does not check for the existence of the index file.
Inputs:
- input_file: currently supports BAM or VCF
Output:
- The input file path with the expected index extension appended: currently
.bai
for BAM files and.tbi
for VCF files
- Add this repository as a submodule in the pipeline of interest.
- Include the
indexFile
function from the modulemain.nf
with a relative path. - Call the function as needed with the approriate input and use returned value as index file name
Author: Yash Patel ([email protected])
pipeline-Nextflow-module is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.
pipeline-Nextflow-module comprises a set of commonly used Nextflow modules.
Copyright (C) 2021 University of California Los Angeles ("Boutros Lab") All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.