Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cnidaria #181

Open
pulichandramoulireddy opened this issue Feb 21, 2023 · 1 comment
Open

Cnidaria #181

pulichandramoulireddy opened this issue Feb 21, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@pulichandramoulireddy
Copy link

pulichandramoulireddy commented Feb 21, 2023

Hi Lin,

In the profile folder under MT_database "Cnidaria_CDS_protein.fa" is present. However, the corresponding CDS_HMM information and option for Cnidaria in "--clade" are missing.

@linzhi2013
Copy link
Owner

linzhi2013 commented Feb 22, 2023

Hey,

Thanks for pointing out the problem!

I will add the information in the next release.

The HMM files are used to screen out candidate mitochondrial sequences initially, and the Cnidaria_CDS_protein.fa file is used to annotate the final mitochondrial genome. The HMM models are generally very robust, which means that an HMM model from other clades should also work for your target clade.

One way to work around now is:

  1. Create a custom profile directory:
    See https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database
$ mkdir ~/mitoz_custom_db
$ cp -a  /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz/profiles ~/mitoz_custom_db

$ ls -lhrt ~/mitoz_custom_db/profiles/
total 16K
-rw-rw-r-- 1 guanliang guanliang    0 May 12 06:47 __init__.py
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 16:06 CDS_HMM
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 16:06 rRNA_CM
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 16:06 __pycache__
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 17:36 MT_database

Now,

$ cd ~/mitoz_custom_db/profiles/MT_database
# rename the file
$ mv Arthropoda_CDS_protein.fa bak.Arthropoda_CDS_protein.fa

# create a soft-link (a "faked" Arthropoda_CDS_protein.fa file)
$ ln -s Cnidaria_CDS_protein.fa Arthropoda_CDS_protein.fa
  1. Use the following command when you run MitoZ:
--profiles_dir ~/mitoz_custom_db/profiles  --clade Arthropoda  --genetic_code 4
# If the mitochondrial genetic code of your target group is 4.

By using --clade Arthropoda, MitoZ will use the CDS_HMM/Arthropoda_CDS.hmm file for candidate mitochondrial sequence searching.

And because we have linked the Cnidaria_CDS_protein.fa file as Arthropoda_CDS_protein.fa, MitoZ will actually use the Cnidaria_CDS_protein.fa file for protein annotation.

If the user's target clade is another group, you can do similar things to make MitoZ work.

  1. If necessary, add more homologous proteins to the Cnidaria_CDS_protein.fa file, especially when some PCGs are missing from the annotation result.

Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database#3-but-what-protein-sequences-are-to-be-used


Tips:
If you do not want to link Cnidaria_CDS_protein.fa as Arthropoda_CDS_protein.fa, you have two options:

  1. just keep using the original Arthropoda_CDS_protein.fa file
    And then, run MitoZ with the following parameters:
--clade Arthropoda  --genetic_code 4    
# but you need to choose the correct genetic code here

MitoZ will simply use the Arthropoda_CDS_protein.fa file for protein annotation. If your target clade or gene is too distant from arthropods, some proteins may be missing in the annotation result.

  1. add the protein sequences of the 13 protein genes of your target clade to this Arthropoda_CDS_protein.fa file (Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database#3-but-what-protein-sequences-are-to-be-used)
$ mkdir ~/mitoz_custom_db
$ cp -a  /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz/profiles ~/mitoz_custom_db

$ cd ~/mitoz_custom_db/profiles/MT_database/
# edit the '~/mitoz_custom_db/profiles/MT_database/Arthropoda_CDS_protein.fa' file with a text editor, like "vim" or the Sublime Text program.

And then, run MitoZ with the following parameters:

--profiles_dir ~/mitoz_custom_db/profiles  --clade Arthropoda  --genetic_code 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants