Skip to content
Sereina Rutschmann edited this page Jul 30, 2016 · 20 revisions

What is DiscoMark?

DiscoMark is a bioinformatics program designed to make it easy to develop phylogenetic markers from orthologous DNA sequences. One of the more tedious tasks in phylogenetics is picking the right markers and design PCR primers for them to be amplified in the samples of interest. DiscoMark supports researchers in this process and scales it up to the genome level by automating the steps from multiple sequence alignment to PCR primer design.

Read more about how to fine-tune the marker discovery process:

Input data

Any kind of orthologous nucleotide sequence data can be used as input for marker design. DiscoMark expects the input data to be structured as follows:

  • 1 folder per taxon (e.g. species)
  • 1 file per orthologous group (e.g. gene), with the filename starting with the ortholog id. Note: sequences of all input species will be matched according to their ortholog id (which is the filename up to the first '.')

example for 2 species and 3 genes:

├── Baetis
│   ├── 413058.cds.fa
│   ├── 413088.cds.fa
│   └── 413294.cds.fa
└── Cloeon
    ├── 413058.cds.fa
    ├── 413088.cds.fa
    └── 413294.cds.fa

So far, DiscoMark is confirmed to work with output from the ortholog prediction program HaMStR (Ebersberger et al. 2009) and Orthograph (https://github.com/mptrsen/Orthograph).

Note: Here is a short guide how to install and run HaMStR

Important: With default settings, HaMStR will output the CDS (with introns removed) as well as the protein sequence for each gene:

ls hamstr_out/species_A
411847.cds.fa
411847.fa
411851.cds.fa
411851.fa
411858.cds.fa
411858.fa
[...]

The current version of DiscoMark can take as input directly the output folders from the ortholog prediction.

Output data

The main output of DiscoMark is a set of PCR primer pairs that amplify the markers contained in the input data. The graphical, interactive HTML output, including the primer list as well as an alignment viewer showing the primer locations supports the user in selecting the primer pairs best suited for their research question.

How to cite DiscoMark

Rutschmann S‡, Detering H‡, Simon S, Fredslund J, Monaghan MT (2016) DiscoMark: Nuclear marker discovery from orthologous sequences using draft genome data. Molecular Ecology Resources, DOI: 10.1111/1755-0998.12576. ‡, co-first authors

When using DiscoMark please also cite:
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution, 30, 772-780.

If you use TrimAl please also cite:
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25, 1972-1973.

If you use BlastN please also cite:
Altschul SF, Madden TL, Schaffer AA et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, 3389-3402.

Camacho C, Coulouris G, Avagyan V et al. (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10, 421.

References

Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: profile hidden markov model based search for orthologs in ESTs. BMC Evolutionary Biology, 9, 157.

Clone this wiki locally