Missing information on the classification provided in dataset 9543 #237

DianRHR · 2023-05-09T17:18:30Z

I was trying to use the information of the article:

Bezdċk, J., & Regalin, R. (2022). Identity of species-group taxa of the Western Palaearctic Clytrini (Coleoptera: Chrysomelidae) described by Maurice Pic and Louis Kocher (Version 1657327952032). Plazi.org taxonomic treatments database. https://doi.org/10.5281/zenodo.4272771

available in ChecklistBank and found that even that the title of the article (and the focus of it) is the tribe Clytrini, this txon and rank is not included in the dtaset.
All the genus are directly under Chrysomelidae.

Could you consider including in the datasets all the taxonomic ranks mentioned in the article?

The textree file looks like this:

jugiora · 2023-05-09T20:00:25Z

Dear Dian
We have added the tribe information to taxa attributes as mentioned.
There was a lack of this information due to the usual parse applied to the taxonomic data extracted, but it can be added in specific cases as this.
Cheers.
Julia

mdoering · 2023-05-10T06:34:16Z

Thanks @jugiora. The DwC archive does not yet contain the tribe. Does a regeneration need manual triggering?

Not also that the genus Stephenympha is wrongly given as a plant.

jugiora · 2023-05-10T18:22:04Z

Dear Dian.
The taxonomic attributes were all fixed. The information should be also updated in DwC in a few hours.
All the best.
Julia

mdoering · 2023-05-11T08:37:00Z

The dwca still contains plants right now:

03EC879FFFB1FFCEA875AF4BFDF21428.taxon Plantae Tracheophyta Liliopsida Poales Poaceae Stephenympha genus Stephenympha Stephenympha Stephenympha https://treatment.plazi.org/id/03EC879FFFB1FFCEA875AF4BFDF21428

@gsautter does it take longer to update?

myrmoteras · 2023-05-11T09:30:26Z

@mdoering no, this has still be a plant, but is fixed.
May be there is a way to filter out all taxononomic names in the nomenclature section to check, that those all are leps.

We might want to set this article also aside, since each treatment is at genus level, but in fact includes a list of species, often with new combinations, such as in Modica and as well synonyms, which might be relevant for ChecklistBank / COL.

see also https://github.com/plazi/Plazi-Communications/issues/1269

DianRHR · 2023-05-12T18:19:11Z

@jugiora thanks for your quick anser, however, I downloaded again the dwca from Checklist bank
https://www.dev.checklistbank.org/dataset/9543/download
and the tribe is not yet included.
My question is if these kind of issues are addressed manually?

flsimoes · 2023-05-12T18:25:41Z

@jugiora thanks for your quick anser, however, I downloaded again the dwca from Checklist bank https://www.dev.checklistbank.org/dataset/9543/download and the tribe is not yet included. My question is if these kind of issues are addressed manually?

Perhaps ChecklistBank hasn't yet gotten the most updated version.

What sort of issues exactly do you mean? Fixing the taxonomy? Then yes, it is fixed manually, as @jugiora did this time.
If you are talking about the update to the DwCA, it should be automatic once we fix things on our end (I think checklistbank only imports the datasets once a day though)... @myrmoteras anything to add?

gsautter · 2023-05-12T18:57:25Z

Judging from https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba , the datset is updated in GBIF by now ... hard to tell how long an update takes to get to CLB from there at this point ...
@mdoering is there a synchronization schedule in place, or some sort of notification based system? Would be great to have an approximate time it usually takes such updates to go through, so we know at what point we should start to worry or investigate.

mdoering · 2023-05-14T18:52:33Z

I nothing is triggered the system checks weekly by default for an update. You could trigger a CLB import from your end each time an archive is rebuild to make sure there is no latency. Its a simple POST call to the API, we would just need to arrange appropriate credentials

gsautter · 2023-05-14T22:24:30Z

I nothing is triggered the system checks weekly by default for an update. You could trigger a CLB import from your end each time an archive is rebuild to make sure there is no latency. Its a simple POST call to the API, we would just need to arrange appropriate credentials

Easy enough to send a similar poke request to CLB as we send to the GBIF API when a DwCA gets updated ... however, GBIF might pull the updated DwCA with some latency, so there would be a non-negligible risk of CLB fetching the old version of the data from GBIF before GBIF fetches the new version from TB ... needs some thought.

mdoering · 2023-05-15T07:17:58Z

CLB does not fetch anything from GBIF. We poll your files directly

gsautter · 2023-05-15T14:18:32Z

I fee like this issue is related, as both concern uplink and sending notifications to CLB: plazi/treatmentBank#90

DianRHR · 2023-05-15T16:45:34Z

o there would be a non-negligible risk of CLB fetching the old version

@flsimoes I mean both.
fixing taxonomy: include a name and taxon rank that are only mentioned in the title , but are important part of the classification (tribe in this case) and that are not considered in DwC.
updating to the DwCA: the discussion in the previous comments.
And my question in mainly in order to know how to proceed once we find missing information on a dataset.

@gsautter I'm affraid we are talking about two different datasets:
I mentioned https://www.dev.checklistbank.org/dataset/9543/about which also is: https://www.gbif.org/dataset/77c874cd-4f85-4746-8466-3ca09e2c2b8d and just checked both and the tribe is not yet included.
The one you mentioned is a different dataset:
https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba

gsautter · 2023-05-15T16:53:10Z

@gsautter I'm affraid we are talking about two different datasets:
I mentioned https://www.dev.checklistbank.org/dataset/9543/about which also is: https://www.gbif.org/dataset/77c874cd-4f85-4746-8466-3ca09e2c2b8d and just checked both and the tribe is not yet included.
The one you mentioned is a different dataset:
https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba

At the dataset level, sure, but at the level of figuring out how to get updates into CLB more quickly and how to get CLB dataset keys into TreatmentBank, they are both about communication between the two systems, and that is something we might well and most likely should discuss in conjunction, as it boils down to adding a CLB communication component to the TreatmentBank back-end server.
Never meant to say the specific dataset issues don't need to be solved individually.

mdoering · 2023-05-16T14:24:45Z

As far as I can see the dwca from Plazi still does not contain the Clytrini tribe.
@gsautter I think I now know why. The classification is not provided via parentNameUsageID, but only as flat, major linnean ranks. And tribe is not included in there:

<field index="3" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/> <!-- blank -->
<field index="4" term="http://rs.tdwg.org/dwc/terms/originalNameUsageID"/> <!-- blank -->
<field index="5" term="http://rs.tdwg.org/dwc/terms/kingdom"/> <!-- taxon@kingdom -->
<field index="6" term="http://rs.tdwg.org/dwc/terms/phylum"/> <!-- taxon@phylum -->
<field index="7" term="http://rs.tdwg.org/dwc/terms/class"/> <!-- taxon@class -->
<field index="8" term="http://rs.tdwg.org/dwc/terms/order"/> <!-- taxon@order -->
<field index="9" term="http://rs.tdwg.org/dwc/terms/family"/> <!-- taxon@family -->
<field index="10" term="http://rs.tdwg.org/dwc/terms/genus"/> <!-- taxon@genus -->
<field index="11" term="http://rs.tdwg.org/dwc/terms/taxonRank"/> <!-- taxon@rank -->
<field index="12" term="http://rs.tdwg.org/dwc/terms/scientificName"/> <!-- reconciled taxon name with reconciled authority, with parentheses and all -->

Ideally we would use parentNameUsageID only - at least if you have a parent child relationship in your model.
Otherwise there are new dwc classification terms on the way we can use including tribe, subtribe and superfamily to get at least somewhat richer trees:

tdwg/dwc#45
tdwg/dwc#46
tdwg/dwc#65

gsautter · 2023-05-16T15:00:54Z

@gsautter I think I now know why. The classification is not provided via parentNameUsageID, but only as flat, major linnean ranks. And tribe is not included in there:

That's correct ... a tribe will only be there if the taxon actually is of rank tribe ... we don't generally store the intermediate ranks internally, either, as there is simply too many of them, and for a long time DwC didn't really support them, either.

The question that still remains open is the handling of updates.

mdoering · 2023-05-16T15:05:47Z

So that means the dwca is up to date and adding the tribe did not change anything, correct?

gsautter · 2023-05-16T15:18:55Z

Regarding the tribe, I think so ... but there also was that "Plantae" vs. "Animalia" cleanup, if I remember correctly ... has the latter come through?

mdoering · 2023-05-19T09:51:54Z

Yes, it is fixed in TB: https://treatment.plazi.org/id/03EC879FFFB1FFCEA875AF4BFDF21428
and also CLB: https://www.checklistbank.org/dataset/58039/taxon/03EC879FFFB1FFCEA875AF4BFDF21428.taxon

mdoering · 2023-05-19T09:54:08Z

Most genera in that dataset have an authorship, but a few don't: https://www.checklistbank.org/dataset/58039/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&rank=genus&sortBy=taxonomic

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

gsautter · 2023-05-24T20:34:27Z

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

I tend to think adding authorityName and authorityYear as well should do the trick ... authority normally is the verbatim authority, as given in the annotated taxon name, with any interpretation (e.g. expansion of abbreviations, or adding the document author(s) and year in case of original descriptions) going to the aforementioned two detail attributes.

gsautter · 2023-05-24T21:28:53Z

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

I tend to think adding authorityName and authorityYear as well should do the trick ... authority normally is the verbatim authority, as given in the annotated taxon name, with any interpretation (e.g. expansion of abbreviations, or adding the document author(s) and year in case of original descriptions) going to the aforementioned two detail attributes.

Turns out adding the two detail attributes did do the trick.

myrmoteras mentioned this issue May 11, 2023

reference group should be synonymic list? #238

Closed

gsautter mentioned this issue May 15, 2023

Add link to Checklistbank and SIBiLS UI, as well from BLR plazi/treatmentBank#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing information on the classification provided in dataset 9543 #237

Missing information on the classification provided in dataset 9543 #237

DianRHR commented May 9, 2023

jugiora commented May 9, 2023

mdoering commented May 10, 2023

jugiora commented May 10, 2023

mdoering commented May 11, 2023

myrmoteras commented May 11, 2023

DianRHR commented May 12, 2023

flsimoes commented May 12, 2023

gsautter commented May 12, 2023

mdoering commented May 14, 2023

gsautter commented May 14, 2023

mdoering commented May 15, 2023

gsautter commented May 15, 2023

DianRHR commented May 15, 2023

gsautter commented May 15, 2023

mdoering commented May 16, 2023 •

edited

Loading

gsautter commented May 16, 2023

mdoering commented May 16, 2023

gsautter commented May 16, 2023

mdoering commented May 19, 2023

mdoering commented May 19, 2023

gsautter commented May 24, 2023

gsautter commented May 24, 2023

Missing information on the classification provided in dataset 9543 #237

Missing information on the classification provided in dataset 9543 #237

Comments

DianRHR commented May 9, 2023

jugiora commented May 9, 2023

mdoering commented May 10, 2023

jugiora commented May 10, 2023

mdoering commented May 11, 2023

myrmoteras commented May 11, 2023

DianRHR commented May 12, 2023

flsimoes commented May 12, 2023

gsautter commented May 12, 2023

mdoering commented May 14, 2023

gsautter commented May 14, 2023

mdoering commented May 15, 2023

gsautter commented May 15, 2023

DianRHR commented May 15, 2023

gsautter commented May 15, 2023

mdoering commented May 16, 2023 • edited Loading

gsautter commented May 16, 2023

mdoering commented May 16, 2023

gsautter commented May 16, 2023

mdoering commented May 19, 2023

mdoering commented May 19, 2023

gsautter commented May 24, 2023

gsautter commented May 24, 2023

mdoering commented May 16, 2023 •

edited

Loading