Skip to content

7. Other Scripts and Programs

Rauf Salamzade edited this page Jan 8, 2022 · 3 revisions

Additional helper scripts and programs which we used for Salamzade et al. 2021 are provided in the subdirectory ConSequences/other/. Here on this page, we provide their usage.

Usage for extractAllSegmentSequences.py

usage: extractAllSegmentSequences.py [-h] -i SCAFFOLDS_FASTA -s
                                     SEGMENTS_LISTING

	Extract sequence for segments from assembly.
	
optional arguments:
  -h, --help            show this help message and exit
  -i SCAFFOLDS_FASTA, --scaffolds_fasta SCAFFOLDS_FASTA
                        Multi-FASTA of all scaffolds in analysis.
  -s SEGMENTS_LISTING, --segments_listing SEGMENTS_LISTING
                        Concatenated Segment_Results.txt files from delineateSegmentsOnReference.py.

Usage for selectRepresentativeSegments.py

usage: selectRepresentativeSegments.py [-h] -s SEGMENTS_LISTING -c
                                       CDHIT_CLUSTERING

	Parse CD-HIT results and select representatives from clusters of analogous segments.
	

optional arguments:
  -h, --help            show this help message and exit
  -s SEGMENTS_LISTING, --segments_listing SEGMENTS_LISTING
                        Concatenated Segment_Results.txt files from delineateSegmentsOnReference.py.
  -c CDHIT_CLUSTERING, --cdhit_clustering CDHIT_CLUSTERING
                        CD-HIT based clustering.

Usage for gatherGeographyInfo.py

usage: gatherGeographyInfo.py [-h] -i NT_HITS_FILE -m MAIL

	Use Entrez functionalities in Biopython and geopy to gather geographic origins of hits to NCBI's nt database.

optional arguments:
  -h, --help            show this help message and exit
  -i NT_HITS_FILE, --nt_hits_file NT_HITS_FILE
                        File listing Nucleotide sample IDs (hits from BLASTing to nt).
  -m MAIL, --mail MAIL  email to use for Entrez.