Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support synonym or equivalent name when calling taxid_from_name #5

Open
ShannonDaddy opened this issue Dec 17, 2021 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@ShannonDaddy
Copy link

ShannonDaddy commented Dec 17, 2021

Hi, when I call function taxid_from_name to get taxid, I get some warnings.

my code:
import taxopy

ncbi_taxdb_dir = "database/ncbi_taxonomy"
taxdb = taxopy.TaxDb(nodes_dmp=f"{ncbi_taxdb_dir}/nodes.dmp",
names_dmp=f"{ncbi_taxdb_dir}/names.dmp",
merged_dmp=f"{ncbi_taxdb_dir}/merged.dmp",
keep_files=True)
taxid_list = taxopy.taxid_from_name('Lactobacillus fermentum', taxdb)
print(taxid_list)

the console output:
[]
C:\Users\AppData\Local\Programs\Python\Python38\lib\site-packages\taxopy\utilities.py:54: Warning: The input name was not found in the taxonomy database.
warnings.warn("The input name was not found in the taxonomy database.", Warning)

Then, I checked the names.dmp and found that 'Lactobacillus fermentum' is a synonym, the scientific name is 'Limosilactobacillus fermentum'. When I use the scientific name in the code, the output is fine.

Is it possible to support synonym or equivalent name when calling taxid_from_name just like another python package ete3 would do?

Thanks a lot!

@apcamargo
Copy link
Owner

Hi @ShannonDaddy. This is something that I've been considering since I implemented taxid_from_name, but I was worried about the significant increase in memory usage. As far as I know ETE3 uses a sqlite database, so memory is not really a problem for them.

That said, I can add a load_synonyms parameter (disabled by default) that would allow synonyms and equivalent names to be added to the database. Is this feature urgent for you?

@ShannonDaddy
Copy link
Author

Hi @ShannonDaddy. This is something that I've been considering since I implemented taxid_from_name, but I was worried about the significant increase in memory usage. As far as I know ETE3 uses a sqlite database, so memory is not really a problem for them.

That said, I can add a load_synonyms parameter (disabled by default) that would allow synonyms and equivalent names to be added to the database. Is this feature urgent for you?

It's not urgent for me. Temporarily, I just create the Taxon object using taxid directly. You can take your time to add the new feature. Thanks for the quick response.

@apcamargo apcamargo added the enhancement New feature or request label Dec 22, 2021
@pooranis
Copy link

pooranis commented Feb 2, 2022

I would like this feature as well! I have used ete3, but I prefer this library for the LCA functions and because it works in situations where memory is available, but persistent disk space is not.

@apcamargo
Copy link
Owner

Thanks for the feedback. This looks like a useful feature to lots of people (I also ended up needing it recently). I'll think about how to implement it without hugely increasing memory usage as soon as I get some free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants