Skip to content

Commit

Permalink
Merge pull request #107 from phac-nml/development
Browse files Browse the repository at this point in the history
Release 0.7.0
  • Loading branch information
Takadonet authored Nov 21, 2019
2 parents 8d4d5e7 + bd75404 commit 276074b
Show file tree
Hide file tree
Showing 37 changed files with 1,386 additions and 106 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# Version 0.7.0

* Added quality module that adds PASS/Fail column and detail information in Summary.tsv
* Added following new optional arguments for Search.py
- --genome-size-lower-bound
- --genome-size-upper-bound
- --minimum-N50-value
- --minimum-contig-length
- --unacceptable-number-contigs
* Add DNA column in Resfinder report

# Version 0.6.1

* Added --output-mlst in Search.py
Expand Down
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,7 @@ include README.md
include MANIFEST.in
include staramr/databases/exclude/data/genes_to_exclude.tsv
include staramr/tests/integration/data/*.fsa
include staramr/tests/integration/data/gene-drug-tables/*.tsv
include staramr/tests/unit/data/*.fasta
recursive-include staramr/databases/data/dist/ *
include staramr/databases/resistance/data/*
46 changes: 27 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ staramr search -o out --pointfinder-organism salmonella *.fasta

**out/summary.tsv**:

| Isolate ID | Genotype | Predicted Phenotype | Plasmid | Scheme | Sequence Type |
|------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------|-----------|---------------|
| SRR1952908 | aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) | streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 |
| SRR1952926 | blaTEM-57, gyrA (S83Y), tet(A) | ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 |
| Isolate ID | Quality Module | Genotype | Predicted Phenotype | Plasmid | Scheme | Sequence Type | Genome Length | N50 value | Number of Contigs Greater Than Or Equal To 300 bp | Quality Module Feedback |
|------------|----------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------|-----------|---------------|---------------|-----------|---------------------------------------------------|-------------------------|
| SRR1952908 | Passed | aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) | streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 | 4796082 | 225419 | 59 | |
| SRR1952926 | Passed | blaTEM-57, gyrA (S83Y), tet(A) | ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 | 4794071 | 225380 | 50 | |

**out/detailed_summary.tsv**:

Expand All @@ -31,10 +31,10 @@ staramr search -o out --pointfinder-organism salmonella *.fasta

**out/resfinder.tsv**:

| Isolate ID | Gene | Predicted Phenotype | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End | Accession |
|------------|--------|---------------------|-----------|----------|-------------------------|-------------|-------|------|-----------|
| SRR1952908 | sul3 | sulfisoxazole | 100 | 100 | 792/792 | contig00030 | 2091 | 2882 | AJ459418 |
| SRR1952908 | tet(A) | tetracycline | 99.92 | 97.8 | 1247/1275 | contig00032 | 1476 | 2722 | AF534183 |
| Isolate ID | Gene | Predicted Phenotype | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End | Accession | Sequence|
|------------|--------|---------------------|-----------|----------|-------------------------|-------------|-------|------|-----------|---------|
| SRR1952908 | sul3 | sulfisoxazole | 100 | 100 | 792/792 | contig00030 | 2091 | 2882 | AJ459418 | ATGA |
| SRR1952908 | tet(A) | tetracycline | 99.92 | 97.8 | 1247/1275 | contig00032 | 1476 | 2722 | AF534183 | ATGT |

**out/pointfinder.tsv**:

Expand Down Expand Up @@ -250,7 +250,7 @@ Please make sure to include `#gene_id` in the first line. The default exclusion

There are 8 different output files produced by `staramr`:

1. `summary.tsv`: A summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one genome per line.
1. `summary.tsv`: A summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one genome per line. A series of descriptive statistics is also provided for each genome as well as feedback for whether or not the genome passes several quality metrics and if not, feedback on why the genome fails.
2. `detailed_summary.tsv`: A detailed summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one gene per line.
3. `resfinder.tsv`: A tabular file of each AMR gene and additional BLAST information from the **ResFinder** database, one gene per line.
4. `pointfinder.tsv`: A tabular file of each AMR point mutation and additional BLAST information from the **PointFinder** database, one gene per line.
Expand All @@ -266,18 +266,24 @@ In addition, the directory `hits/` stores fasta files of the specific blast hits
The **summary.tsv** output file generated by `staramr` contains the following columns:

* __Isolate ID__: The id of the isolate/genome file(s) passed to `staramr`.
* __Quality Module__: The isolate/genome file(s) pass/fail result(s) for the quality metrics
* __Genotype__: The AMR genotype of the isolate.
* __Predicted Phenotype__: The predicted AMR phenotype (drug resistances) for the isolate.
* __Plasmid__: Plasmid types that were found for the isolate.
* __Scheme__: The MLST scheme used
* __Sequence Type__: The sequence type that's assigned when combining all allele types
* __Genome Length__: The isolate/genome file(s) genome length(s)
* __N50 value__: The isolate/genome file(s) N50 value(s)
* __Number of Contigs Greater Than Or Equal To 300 bp__: The number of contigs greater or equal to 300 base pair in the isolate/genome file(s)
* __Quality Module Feedback__: The isolate/genome file(s) detailed feedback for the quality metrics

### Example

| Isolate ID | Genotype | Predicted Phenotype | Plasmid | Scheme | Sequence Type |
|------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------|-----------|---------------|
| SRR1952908 | aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) | streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 |
| SRR1952926 | blaTEM-57, gyrA (S83Y), tet(A) | ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 |
| Isolate ID | Quality Module | Genotype | Predicted Phenotype | Plasmid | Scheme | Sequence Type | Genome Length | N50 value | Number of Contigs Greater Than Or Equal To 300 bp | Quality Module Feedback |
|------------|----------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------|-----------|---------------|---------------|-----------|---------------------------------------------------|-------------------------|
| SRR1952908 | Passed | aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) | streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 | 4796082 | 225419 | 59 | |
| SRR1952926 | Passed | blaTEM-57, gyrA (S83Y), tet(A) | ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline | ColpVC, IncFIB(S), IncFII(S), IncI1 | senterica | 11 | 4794071 | 225380 | 50 | |


## detailed_summary.tsv

Expand Down Expand Up @@ -316,13 +322,14 @@ The **resfinder.tsv** output file generated by `staramr` contains the following
* __Start__: The start of the AMR gene (will be greater than __End__ if on minus strand).
* __End__: The end of the AMR gene.
* __Accession__: The accession of the AMR gene in the ResFinder database.
* __Sequence__: The AMR Gene sequence

### Example

| Isolate ID | Gene | Predicted Phenotype | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End | Accession |
|------------|------------|----------------------|------------|-----------|--------------------------|--------------|--------|-------|-----------|
| SRR1952908 | sul3 | sulfisoxazole | 100.00 | 100.00 | 792/792 | contig00030 | 2091 | 2882 | AJ459418 |
| SRR1952908 | tet(A) | tetracycline | 99.92 | 100.00 | 1200/1200 | contig00032 | 1551 | 2750 | AJ517790 |
| Isolate ID | Gene | Predicted Phenotype | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End | Accession | Sequence|
|------------|--------|---------------------|-----------|----------|-------------------------|-------------|-------|------|-----------|---------|
| SRR1952908 | sul3 | sulfisoxazole | 100 | 100 | 792/792 | contig00030 | 2091 | 2882 | AJ459418 | ATGA |
| SRR1952908 | tet(A) | tetracycline | 99.92 | 97.8 | 1247/1275 | contig00032 | 1476 | 2722 | AF534183 | ATGT |

## pointfinder.tsv

Expand Down Expand Up @@ -468,7 +475,7 @@ This software is still a work-in-progress. In particular, not all organisms sto

# Acknowledgements

Some ideas for the software were derived from the [ResFinder][resfinder-git], [PointFinder][pointfinder-git], and [PlasmidFinder][plasmidfinder-git] command-line software, as well as from [ABRicate][abricate].
Some ideas for the software were derived from the [ResFinder][resfinder-git], [PointFinder][pointfinder-git], and [PlasmidFinder][plasmidfinder-git] command-line software, as well as from [ABRicate][abricate] and from [SISTR (Salmonella In Silico Typing Resource) command-line tool ][sistr_cmd].

Phenotype/drug resistance predictions are provided with support from the NARMS/CIPARS Molecular Working Group.

Expand Down Expand Up @@ -517,6 +524,7 @@ specific language governing permissions and limitations under the License.
[pointfinder-git]: https://bitbucket.org/genomicepidemiology/pointfinder-3.0
[plasmidfinder-git]: https://bitbucket.org/genomicepidemiology/plasmidfinder
[abricate]: https://github.com/tseemann/abricate
[sistr_cmd]: https://github.com/phac-nml/sistr_cmd
[shovill]: https://github.com/tseemann/shovill
[ariba]: https://github.com/sanger-pathogens/ariba
[rgi]: https://github.com/arpcard/rgi
Expand All @@ -525,4 +533,4 @@ specific language governing permissions and limitations under the License.
[card-web]: https://card.mcmaster.ca/
[tutorial]: doc/tutorial/staramr-tutorial.ipynb
[genes_to_exclude.tsv]: staramr/databases/exclude/data/genes_to_exclude.tsv
[MLST]: https://github.com/tseemann/mlst
[MLST]: https://github.com/tseemann/mlst
Binary file modified images/search_command.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion staramr/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.6.1'
__version__ = '0.7.0'
34 changes: 25 additions & 9 deletions staramr/detection/AMRDetection.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from staramr.blast.results.pointfinder.BlastResultsParserPointfinder import BlastResultsParserPointfinder
from staramr.blast.results.resfinder.BlastResultsParserResfinder import BlastResultsParserResfinder
from staramr.results.AMRDetectionSummary import AMRDetectionSummary
from staramr.results.QualityModule import QualityModule

logger = logging.getLogger("AMRDetection")

Expand Down Expand Up @@ -57,17 +58,17 @@ def __init__(self, resfinder_database: ResfinderBlastDatabase, amr_detection_han

self._genes_to_exclude = genes_to_exclude

def _create_amr_summary(self, files: List[str], resfinder_dataframe: DataFrame,
def _create_amr_summary(self, files: List[str], resfinder_dataframe: DataFrame,quality_module_dataframe: DataFrame,
pointfinder_dataframe: Optional[BlastResultsParserPointfinder],
plasmidfinder_dataframe: DataFrame, mlst_dataframe: DataFrame) -> DataFrame:
amr_detection_summary = AMRDetectionSummary(files, resfinder_dataframe,
amr_detection_summary = AMRDetectionSummary(files, resfinder_dataframe,quality_module_dataframe,
pointfinder_dataframe, plasmidfinder_dataframe, mlst_dataframe)
return amr_detection_summary.create_summary(self._include_negative_results)

def _create_detailed_amr_summary(self, files: List[str], resfinder_dataframe: DataFrame,
def _create_detailed_amr_summary(self, files: List[str], resfinder_dataframe: DataFrame,quality_module_dataframe: DataFrame,
pointfinder_dataframe: Optional[BlastResultsParserPointfinder],
plasmidfinder_dataframe: DataFrame, mlst_dataframe: DataFrame) -> DataFrame:
amr_detection_summary = AMRDetectionSummary(files, resfinder_dataframe,
amr_detection_summary = AMRDetectionSummary(files, resfinder_dataframe,quality_module_dataframe,
pointfinder_dataframe, plasmidfinder_dataframe, mlst_dataframe)
return amr_detection_summary.create_detailed_summary(self._include_negative_results)

Expand Down Expand Up @@ -96,6 +97,13 @@ def _create_plasmidfinder_dataframe(self, plasmidfinder_blast_map: Dict[str, Bla
genes_to_exclude=self._genes_to_exclude)
return plasmidfinder_parser.parse_results()

def create_quality_module_dataframe(self,files,genome_size_lower_bound,genome_size_upper_bound,minimum_N50_value,
minimum_contig_length,unacceptable_num_contigs) ->DataFrame:
quality_module = QualityModule(files,genome_size_lower_bound,genome_size_upper_bound,minimum_N50_value,
minimum_contig_length,unacceptable_num_contigs)

return quality_module.create_quality_module_dataframe()

def _generate_empty_columns(self, row: list, max_cols: int, cur_cols: int) -> list:
if(cur_cols < max_cols):
for i in range(max_cols-cur_cols):
Expand Down Expand Up @@ -139,15 +147,22 @@ def _create_mlst_dataframe(self, mlst_data: str) -> DataFrame:

return mlst_dataframe

def run_amr_detection(self, files, pid_threshold, plength_threshold_resfinder, plength_threshold_pointfinder,
plength_threshold_plasmidfinder, report_all=False, ignore_invalid_files=False, mlst_scheme=None) -> None:
def run_amr_detection(self,files, pid_threshold, plength_threshold_resfinder, plength_threshold_pointfinder,
plength_threshold_plasmidfinder, genome_size_lower_bound,genome_size_upper_bound,
minimum_N50_value,minimum_contig_length,unacceptable_num_contigs,
report_all=False, ignore_invalid_files=False, mlst_scheme=None) -> None:
"""
Scans the passed files for AMR genes.
:param files: The files to scan.
:param pid_threshold: The percent identity threshold for BLAST results.
:param plength_threshold_resfinder: The percent length overlap for BLAST results (resfinder).
:param plength_threshold_pointfinder: The percent length overlap for BLAST results (pointfinder).
:param plength_threshold_plasmidfinder: The percent length overlap for BLAST results (plasmidfinder).
:param genome_size_lower_bound: The lower bound for the genome size as defined by the user for quality metrics
:param genome_size_upper_bound: The upper bound for the genome size as defined by the user for quality metrics
:param minimum_N50_value: The minimum N50 value as defined by the user for quality metrics
:param minimum_contig_length: The minimum contig length as defined by the user for quality metrics
:param unacceptable_num_contigs: The number of contigs in a file, equal to or above our minimum contig length, for which to raise a flag as defined by the user for quality metrics
:param report_all: Whether or not to report all blast hits.
:param ignore_invalid_files: Skips the invalid input files if set.
:param mlst_scheme: Specifys scheme name MLST uses if set.
Expand All @@ -157,6 +172,8 @@ def run_amr_detection(self, files, pid_threshold, plength_threshold_resfinder, p
files_copy = copy.deepcopy(files)
files = self._validate_files(files_copy, ignore_invalid_files)

self._quality_module_dataframe=self.create_quality_module_dataframe(files,genome_size_lower_bound,genome_size_upper_bound,minimum_N50_value,minimum_contig_length,unacceptable_num_contigs)

self._amr_detection_handler.run_blasts_mlst(files, mlst_scheme)

resfinder_blast_map = self._amr_detection_handler.get_resfinder_outputs()
Expand All @@ -177,10 +194,10 @@ def run_amr_detection(self, files, pid_threshold, plength_threshold_resfinder, p
self._pointfinder_dataframe = self._create_pointfinder_dataframe(pointfinder_blast_map, pid_threshold,
plength_threshold_pointfinder, report_all)

self._summary_dataframe = self._create_amr_summary(files, self._resfinder_dataframe,
self._summary_dataframe = self._create_amr_summary(files, self._resfinder_dataframe,self._quality_module_dataframe,
self._pointfinder_dataframe, self._plasmidfinder_dataframe, self._mlst_dataframe)

self._detailed_summary_dataframe = self._create_detailed_amr_summary(files, self._resfinder_dataframe,
self._detailed_summary_dataframe = self._create_detailed_amr_summary(files, self._resfinder_dataframe,self._quality_module_dataframe,
self._pointfinder_dataframe,
self._plasmidfinder_dataframe,
self._mlst_dataframe)
Expand Down Expand Up @@ -269,7 +286,6 @@ def get_plasmidfinder_results(self):
Gets a pd.DataFrame for the PlasmidFinder results.
:return: A pd.DataFrame for the PlasmidFinder results.
"""

self._plasmidfinder_dataframe = self._plasmidfinder_dataframe.rename({'Gene':'Plasmid'}, axis=1)
return self._plasmidfinder_dataframe

Expand Down
Loading

0 comments on commit 276074b

Please sign in to comment.