Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information on the classification provided in dataset 9543 #237

Open
DianRHR opened this issue May 9, 2023 · 22 comments
Open

Missing information on the classification provided in dataset 9543 #237

DianRHR opened this issue May 9, 2023 · 22 comments

Comments

@DianRHR
Copy link

DianRHR commented May 9, 2023

I was trying to use the information of the article:

Bezdċk, J., & Regalin, R. (2022). Identity of species-group taxa of the Western Palaearctic Clytrini (Coleoptera: Chrysomelidae) described by Maurice Pic and Louis Kocher (Version 1657327952032). Plazi.org taxonomic treatments database. https://doi.org/10.5281/zenodo.4272771

available in ChecklistBank and found that even that the title of the article (and the focus of it) is the tribe Clytrini, this txon and rank is not included in the dtaset.
All the genus are directly under Chrysomelidae.

Could you consider including in the datasets all the taxonomic ranks mentioned in the article?

The textree file looks like this:
image

@jugiora
Copy link

jugiora commented May 9, 2023

Dear Dian
We have added the tribe information to taxa attributes as mentioned.
There was a lack of this information due to the usual parse applied to the taxonomic data extracted, but it can be added in specific cases as this.
Cheers.
Julia

@mdoering
Copy link

Thanks @jugiora. The DwC archive does not yet contain the tribe. Does a regeneration need manual triggering?

Not also that the genus Stephenympha is wrongly given as a plant.

@jugiora
Copy link

jugiora commented May 10, 2023

Dear Dian.
The taxonomic attributes were all fixed. The information should be also updated in DwC in a few hours.
All the best.
Julia

@mdoering
Copy link

The dwca still contains plants right now:

03EC879FFFB1FFCEA875AF4BFDF21428.taxon Plantae Tracheophyta Liliopsida Poales Poaceae Stephenympha genus Stephenympha Stephenympha Stephenympha https://treatment.plazi.org/id/03EC879FFFB1FFCEA875AF4BFDF21428

@gsautter does it take longer to update?

@myrmoteras
Copy link
Contributor

@mdoering no, this has still be a plant, but is fixed.
May be there is a way to filter out all taxononomic names in the nomenclature section to check, that those all are leps.

We might want to set this article also aside, since each treatment is at genus level, but in fact includes a list of species, often with new combinations, such as in Modica and as well synonyms, which might be relevant for ChecklistBank / COL.

see also https://github.com/plazi/Plazi-Communications/issues/1269

@DianRHR
Copy link
Author

DianRHR commented May 12, 2023

@jugiora thanks for your quick anser, however, I downloaded again the dwca from Checklist bank
https://www.dev.checklistbank.org/dataset/9543/download
and the tribe is not yet included.
My question is if these kind of issues are addressed manually?

@flsimoes
Copy link

@jugiora thanks for your quick anser, however, I downloaded again the dwca from Checklist bank https://www.dev.checklistbank.org/dataset/9543/download and the tribe is not yet included. My question is if these kind of issues are addressed manually?

Perhaps ChecklistBank hasn't yet gotten the most updated version.

What sort of issues exactly do you mean? Fixing the taxonomy? Then yes, it is fixed manually, as @jugiora did this time.
If you are talking about the update to the DwCA, it should be automatic once we fix things on our end (I think checklistbank only imports the datasets once a day though)... @myrmoteras anything to add?

@gsautter
Copy link

Judging from https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba , the datset is updated in GBIF by now ... hard to tell how long an update takes to get to CLB from there at this point ...
@mdoering is there a synchronization schedule in place, or some sort of notification based system? Would be great to have an approximate time it usually takes such updates to go through, so we know at what point we should start to worry or investigate.

@mdoering
Copy link

I nothing is triggered the system checks weekly by default for an update. You could trigger a CLB import from your end each time an archive is rebuild to make sure there is no latency. Its a simple POST call to the API, we would just need to arrange appropriate credentials

@gsautter
Copy link

I nothing is triggered the system checks weekly by default for an update. You could trigger a CLB import from your end each time an archive is rebuild to make sure there is no latency. Its a simple POST call to the API, we would just need to arrange appropriate credentials

Easy enough to send a similar poke request to CLB as we send to the GBIF API when a DwCA gets updated ... however, GBIF might pull the updated DwCA with some latency, so there would be a non-negligible risk of CLB fetching the old version of the data from GBIF before GBIF fetches the new version from TB ... needs some thought.

@mdoering
Copy link

CLB does not fetch anything from GBIF. We poll your files directly

@gsautter
Copy link

I fee like this issue is related, as both concern uplink and sending notifications to CLB: plazi/treatmentBank#90

@DianRHR
Copy link
Author

DianRHR commented May 15, 2023

o there would be a non-negligible risk of CLB fetching the old version

@flsimoes I mean both.
fixing taxonomy: include a name and taxon rank that are only mentioned in the title , but are important part of the classification (tribe in this case) and that are not considered in DwC.
updating to the DwCA: the discussion in the previous comments.
And my question in mainly in order to know how to proceed once we find missing information on a dataset.

@gsautter I'm affraid we are talking about two different datasets:
I mentioned https://www.dev.checklistbank.org/dataset/9543/about which also is: https://www.gbif.org/dataset/77c874cd-4f85-4746-8466-3ca09e2c2b8d and just checked both and the tribe is not yet included.
The one you mentioned is a different dataset:
https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba

@gsautter
Copy link

@gsautter I'm affraid we are talking about two different datasets:
I mentioned https://www.dev.checklistbank.org/dataset/9543/about which also is: https://www.gbif.org/dataset/77c874cd-4f85-4746-8466-3ca09e2c2b8d and just checked both and the tribe is not yet included.
The one you mentioned is a different dataset:
https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba

At the dataset level, sure, but at the level of figuring out how to get updates into CLB more quickly and how to get CLB dataset keys into TreatmentBank, they are both about communication between the two systems, and that is something we might well and most likely should discuss in conjunction, as it boils down to adding a CLB communication component to the TreatmentBank back-end server.
Never meant to say the specific dataset issues don't need to be solved individually.

@mdoering
Copy link

mdoering commented May 16, 2023

As far as I can see the dwca from Plazi still does not contain the Clytrini tribe.
@gsautter I think I now know why. The classification is not provided via parentNameUsageID, but only as flat, major linnean ranks. And tribe is not included in there:

<field index="3" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/> <!-- blank -->
<field index="4" term="http://rs.tdwg.org/dwc/terms/originalNameUsageID"/> <!-- blank -->
<field index="5" term="http://rs.tdwg.org/dwc/terms/kingdom"/> <!-- taxon@kingdom -->
<field index="6" term="http://rs.tdwg.org/dwc/terms/phylum"/> <!-- taxon@phylum -->
<field index="7" term="http://rs.tdwg.org/dwc/terms/class"/> <!-- taxon@class -->
<field index="8" term="http://rs.tdwg.org/dwc/terms/order"/> <!-- taxon@order -->
<field index="9" term="http://rs.tdwg.org/dwc/terms/family"/> <!-- taxon@family -->
<field index="10" term="http://rs.tdwg.org/dwc/terms/genus"/> <!-- taxon@genus -->
<field index="11" term="http://rs.tdwg.org/dwc/terms/taxonRank"/> <!-- taxon@rank -->
<field index="12" term="http://rs.tdwg.org/dwc/terms/scientificName"/> <!-- reconciled taxon name with reconciled authority, with parentheses and all -->

Ideally we would use parentNameUsageID only - at least if you have a parent child relationship in your model.
Otherwise there are new dwc classification terms on the way we can use including tribe, subtribe and superfamily to get at least somewhat richer trees:

tdwg/dwc#45
tdwg/dwc#46
tdwg/dwc#65

@gsautter
Copy link

@gsautter I think I now know why. The classification is not provided via parentNameUsageID, but only as flat, major linnean ranks. And tribe is not included in there:

That's correct ... a tribe will only be there if the taxon actually is of rank tribe ... we don't generally store the intermediate ranks internally, either, as there is simply too many of them, and for a long time DwC didn't really support them, either.

The question that still remains open is the handling of updates.

@mdoering
Copy link

So that means the dwca is up to date and adding the tribe did not change anything, correct?

@gsautter
Copy link

Regarding the tribe, I think so ... but there also was that "Plantae" vs. "Animalia" cleanup, if I remember correctly ... has the latter come through?

@gsautter
Copy link

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

I tend to think adding authorityName and authorityYear as well should do the trick ... authority normally is the verbatim authority, as given in the annotated taxon name, with any interpretation (e.g. expansion of abbreviations, or adding the document author(s) and year in case of original descriptions) going to the aforementioned two detail attributes.

@gsautter
Copy link

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

I tend to think adding authorityName and authorityYear as well should do the trick ... authority normally is the verbatim authority, as given in the annotated taxon name, with any interpretation (e.g. expansion of abbreviations, or adding the document author(s) and year in case of original descriptions) going to the aforementioned two detail attributes.

Turns out adding the two detail attributes did do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants