2. Rapid Identification of New Instances of High Interest Segments

The other two main programs in ConSequences are generateReferenceMSA.py and querySegmentInRawReads.py which enable the quick prediction of whether a sample has a segment of interest directly from short read sequencing reads.

Description of Method

If a conserved segment of interest is identified from delineateSegmentsOnReference.py based analysis. Result files generated by that program can be provided as input to generateReferenceMSA.py to construct a reference-based multiple sequence alignment (MSA) for the segment.

Afterwards, the program querySegmentInRawReads.py can be used to predict whether the defining/core components of the MSA are present in the raw reads of a sample (provided as FASTQ files) using a sliding k-mer analysis of one or multiple segment MSAs. As slight variations can exist between instances of a signature in the multiple sequence alignment, a sample only needed to possess one of the possible 31-mers.

Usage for generateReferenceMSA.py

usage: generateReferenceMSA.py [-h] -r REF_FASTA -s START_COORD -e END_COORD
                               -m MAPPING_SCAFFS -w SLIDING_WINDOW_RESULTS -o
                               MSA_OUTPUT [-l LOG_FILE]

	Program: generateReferenceMSA.py
	Author: Rauf Salamzade
	The Broad Institute of MIT and Harvard
	Earl Lab / Bacterial Genomics Group

	This program will generate a . If facing difficulties, please raise 
        issues on the github page: https://github.com/broadinstitute/consequences	

optional arguments:
  -h, --help            show this help message and exit
  -r REF_FASTA, --ref_fasta REF_FASTA
                        FASTA for reference scaffold upon which 
                        segment lies.
  -s START_COORD, --start_coord START_COORD
                        Starting coordinate of segment.
  -e END_COORD, --end_coord END_COORD
                        Ending coordinate of segment.
  -m MAPPING_SCAFFS, --mapping_scaffs MAPPING_SCAFFS
                        List of scaffolds with segment. One per line.
  -w SLIDING_WINDOW_RESULTS, --sliding_window_results SLIDING_WINDOW_RESULTS
                        Sliding window results file which contains 
                        variant information.
  -o MSA_OUTPUT, --msa_output MSA_OUTPUT
                        Multiple-sequence-alignment to be used for rapid 
                        identification of signature sequences.
  -l LOG_FILE, --log_file LOG_FILE
                        Path to logging output file

Usage for querySegmentInRawReads.py

usage: querySegmentInRawReads.py [-h] -m REF_MSAS [REF_MSAS ...] -r REFERENCES
                                 [REFERENCES ...] -i ILLUMINA_READS
                                 [ILLUMINA_READS ...] -o OUTPUT_PREFIX
                                 [-d MIN_DEPTH] [-k KMER_LENGTH] [-c CORES]

	Program: generateReferenceMSA.py
	Author: Rauf Salamzade
	The Broad Institute of MIT and Harvard
	Earl Lab / Bacterial Genomics Group

	This program will generate a . If facing difficulties, please 
        raise issues on the github page: https://github.com/broadinstitute/consequences
	

optional arguments:
  -h, --help            show this help message and exit
  -m REF_MSAS [REF_MSAS ...], --ref_msas REF_MSAS [REF_MSAS ...]
                        Multi-FASTA reference-based multiple sequence alignment(s) 
                        for segment(s) of interest.
  -r REFERENCES [REFERENCES ...], --references REFERENCES [REFERENCES ...]
                        Reference sample. Should be provided in same respective 
                        order as --ref_msas.
  -i ILLUMINA_READS [ILLUMINA_READS ...], --illumina_reads ILLUMINA_READS [ILLUMINA_READS ...]
                        Illumina or any high-accuracy sequencing data in FASTQ format.
  -o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
                        Multiple-sequence-alignment to be used for rapid 
                        identification of signature sequences.
  -d MIN_DEPTH, --min_depth MIN_DEPTH
                        Minimum number of times k-mer has to occur in sample 
                        read's to avoid inclusion of sequencing errors.
  -k KMER_LENGTH, --kmer_length KMER_LENGTH
                        Size of k-mer to use for searching. Default is 31.
  -c CORES, --cores CORES
                        Number of cores to provide JellyFish. Default is 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Rapid Identification of New Instances of High Interest Segments

Description of Method

Usage for generateReferenceMSA.py

Usage for querySegmentInRawReads.py

Clone this wiki locally