Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Basis enumeration does not match ABCDEFG implementation #15

Open
samleeflang opened this issue Oct 7, 2022 · 2 comments
Open

Comments

@samleeflang
Copy link

While working on the ingestion of ABCDEFG data we has some issue ingesting data coming from several BioCase instances.
In the 2.06 which most instance use the RecordBasis is a controlled vocabulary consisting of these elements:
"PreservedSpecimen","LivingSpecimen","FossileSpecimen","FossilSpecimen","OtherSpecimen","HumanObservation","MachineObservation","DrawingOrPhotograph","MultimediaObject","AbsenceObservation"
However in the EFG XML other types of RecordBasis, resulting in XML which does not validate against the ABCD schema.
This gives validation errors such as:
cvc-enumeration-valid: Value 'MineralSpecimen' is not facet-valid with respect to enumeration '[PreservedSpecimen, LivingSpecimen, FossileSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, MultimediaObject]'. It must be a value from the enumeration.

BioCase instances produces invalid data should be something to be avoided. However, there are already applications dependent on these (invalid) types being in in the RecordBasis. For example GeoCase uses this to populate the Specimen Type.

If I look at the 3.0 version of ABCD there is now some space to include geological specimen in the ABCD standard with the new RecordBasis type MineralSpecimen. However, with the BioCase providing EFG data we already noticed a kind of standard for the types, which are also used in GeoCase. The following types we have seen used:
"Unspecified", "RockSpecimen", "MineralSpecimen", "MeteoriteSpecimen"

As these types are already semi standardized and actively used within both the BioCase EFG instances and the GeoCase portal I would propose to include also the other types into the ABCD standard.
Additionally I would like to propose that before data is exchanged the data is validated so we are sure that all ABCD(EFG) data communicated complies to the data standard.

Interested in what other think regarding this subject.

For an example of a ABCDEFG from TalTech which uses RockSpecimen see:
https://bc.geocollections.info/querytool/raw.cgi?dsa=sarv&filter=(inst=Department%20of%20Geology,%20TalTech)AND(col=GIT)AND(cat=374-5)&schema=http://www.tdwg.org/schemas/abcd/2.06&wrapper_url=https://bc.geocollections.info/pywrapper.cgi?dsa=sarv

@samleeflang
Copy link
Author

This issue is broader than just the recordBasis. Additionally the variety of values used for recordBasis is also broader than initially thought. This might mean that it is better to remove the vocabularies for several fields completely and make these fields free text fields. This would better reflect the current situation. More discussions is needed before we move forward with the vocabularies in ABCD 3.0

@samleeflang
Copy link
Author

On 7-11-2022 this issue was discussed in the ABCD TDWG meeting.
The conclusion was that it would not be an issue to add additional recordBasis types to the ABCD 3.0 standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant