-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add list of all names to Taxon #18
base: master
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request adds a new feature to the Taxon class in the taxopy library, introducing a 'names' property that provides a list of all names associated with a taxon, including scientific names, common names, and other variants. The implementation involves modifications to the TaxDb class to store and manage multiple names per taxon, and updates to the Taxon class to expose this new information. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mkuhn - I've reviewed your changes - here's some feedback:
Overall Comments:
- Please update the documentation to reflect the new
names
property and its usage. This will help users understand and utilize this new feature effectively. - Consider the performance implications of storing all names for every taxon. You might want to implement lazy loading of names or provide an option to disable this feature for users who don't need it, to minimize memory usage.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
taxopy/core.py
Outdated
@@ -136,6 +136,10 @@ def __init__( | |||
def taxid2name(self) -> Dict[int, str]: | |||
return self._taxid2name | |||
|
|||
@property | |||
def taxid2names(self) -> Dict[int, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Add a docstring for the taxid2names property
To maintain consistency with other properties in the class and improve documentation, consider adding a docstring to the taxid2names property. For example:
@property
def taxid2names(self) -> Dict[int, List[Tuple[str, str]]]:
"""Returns a dictionary mapping taxon IDs to lists of (kind, name) tuples."""
return self._taxid2names
Also, note that the return type hint should be updated to reflect the actual structure of the returned dictionary.
@property
def taxid2names(self) -> Dict[int, List[Tuple[str, str]]]:
"""Returns a dictionary mapping taxon IDs to lists of (kind, name) tuples."""
return self._taxid2names
@@ -302,6 +317,7 @@ def __init__(self, taxid: int, taxdb: TaxDb): | |||
"The input integer is not a valid NCBI taxonomic identifier." | |||
) | |||
self._name = taxdb.taxid2name[self.taxid] | |||
self._names = taxdb.taxid2names[self.taxid] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Improve robustness of names attribute assignment
To handle cases where a taxid might not be present in the taxid2names dictionary, consider using the dict.get() method with a default value. This would make the code more robust:
self._names = taxdb.taxid2names.get(self.taxid, [])
This approach ensures that even if a taxid is not found, an empty list is assigned instead of raising a KeyError.
self._names = taxdb.taxid2names[self.taxid] | |
self._names = taxdb.taxid2names.get(self.taxid, []) |
if self._merged_dmp: | ||
for oldtaxid, newtaxid in self._oldtaxid2newtaxid.items(): | ||
taxid2name[oldtaxid] = taxid2name[newtaxid] | ||
return taxid2name | ||
return taxid2name, taxid2names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here we could convert taxid2names
to a dict
just so both taxid2names
and taxid2name
are of the same class
Hi @mkuhn, Sorry for the delay. I’ve been on post-congress vacation and will be back to work on Thursday. Thanks for the PR! Let's merge it into the main branch :) A few comments:
|
No worries! I could go ahead with my own fork for now.
Hmm, it's probably cleaner to have on internal dictionary, taxid2names which contains a tuple of names for each kind of name. E.g. for Sus scrofa, these are the names from
So there are two common names. To initialize the name of a species, one could then get it with Here are the counts of name types: 25 genbank acronym So it's a limited list.
Just a rough estimate,
Please let me know if I should go ahead and make the internal |
So, would A potential solution is to convert >>> taxid2name[9823]
"Sus scrofa"
>>> taxid2name.common_name[9823]
("pigs <Sus scrofa>", "swine", "wild boar")
>>> taxid2name.acronym[9823]
None # or KeyError Converting |
Ah, I thought that I find the code example a bit confusing. As a Python coder, I wouldn't expect a dictionary(-like object) to have an attribute that is then also a dictionary. Which organization level makes more sense to you, How about:
With |
Good point. I like your suggestion! Do you think you could update the PR with this API? |
Hi Antonio, not sure if you will find this useful, but I needed a more complete list of names for a taxon in addition to the scientific name. I didn't update the docs yet. If you'd consider merging this, I can also add something to the documentation.
Summary by Sourcery
Introduce a new feature to the
Taxon
class that allows retrieval of a comprehensive list of names for a taxon, enhancing the existing functionality by including various name types such as scientific, common, and authority names.New Features:
names
to theTaxon
class that provides a list of all names associated with a taxon, including scientific, common, and authority names.Enhancements:
_import_names
method to populate a new dictionarytaxid2names
that stores all names associated with each taxon ID.