Skip to content

michoug/PhyloEuk

Repository files navigation

PhyloEuk: Phylogenomics for Eukaryotes

Warnings: This pipeline is an ongoing work and for now only "works" for Ocrophytes, but there is no real reason why it may not work for others taxa with few modifications in the script.

Goal

Heavily inspired by GTDB-TK and Busco, the goal of this software is to determine the taxonomy of your eukaryotic strain(s) of interest based on the presence of single copy genes.

To Do

Add options to broaden the use case.

Dependencies

See dependencies.bib for citation of these dependencies.

Installation

conda install -c bioconda -c conda-forge mamba
mamba create -n phyloeuk -c bioconda -c conda-forge trimal mamba mafft busco=5 iqtree perl-bioperl perl-file-slurp bioawk epa-ng

Run

Caveats

These scripts are hardcoded to select single copy genes that are present in at least 30 reference genomes. At the moment, you cannot restart the processes, but opening the scripts and copy/paste the different commands will work. The more reference genomes, the longer of course. The last steps, aka the tree generation, is the most time consuming part.

Pipeline

git clone https://github.com/michoug/PhyloEuk.git

Put all references and MAGs proteomes (faa files) in folders called reference and MAGs, respectively.

Run the runReference.sh script. Run the runMAGs.sh script.

About

Phylogenomics for Eukaryotes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published