Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-ISSUE_COORDINATES_CENTEROFCOUNTRY #287

Open
ArthurChapman opened this issue Feb 9, 2024 · 17 comments
Open

TG2-ISSUE_COORDINATES_CENTEROFCOUNTRY #287

ArthurChapman opened this issue Feb 9, 2024 · 17 comments
Labels
Conformance CORE TG2 CORE tests Issue A potential issue Parameterized Test requires a parameter SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Feb 9, 2024

TestField Value
GUID 256e51b3-1e08-4349-bb7e-5186631c3f8e
Label ISSUE_COORDINATES_CENTEROFCOUNTRY
Description Are the supplied geographic coordinates within a defined buffer of the center of the country?
TestType Issue
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:countryCode
dwc:decimalLatitude
dwc:decimalLongitude
Information Elements Consulted dwc:coordinateUncertaintyInMeters
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are bdq:Empty; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is bdq:Empty or less than half the square root of the area of the country; otherwise NOT_ISSUE.
Data Quality Dimension Conformance
Term-Actions COORDINATES_CENTEROFCOUNTRY
Parameter(s) bdq:spatialBufferInMeters
bdq:sourceAuthority
Source Authority bdq:spatialBufferInMeters default = "5000"
bdq:sourceAuthority default = "GBIF Catalogue of Country Centroides" {[https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv]}
Specification Last Updated 2024-08-28
Examples [dwc:decimalLatitude="-35.38804", dwc:decimalLongitude="-65.154964", dwc:countryCode="AR": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="coordinates fall within buffered distance in the bdq:sourceAuthority for dwc:countryCode"]
[dwc:decimalLatitude="-34.184199", dwc:decimalLongitude="-65.509403", dwc:countryCode="AR": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="coordinates fall outside buffered distance in the bdq:sourceAuthority for dwc:countryCode"]
Source GBIF
References
  • Waller JT (2023) Processing Country Centroids at the Global Biodiversity Information Facility. Biodiversity Information Science and Standards 7: e110728. https://doi.org/10.3897/biss.7.110728
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes We have increased the buffer to 5000 meters to cater for differences that may have arisen due to the difference in geodetic datums
@ArthurChapman ArthurChapman added TG2 Issue A potential issue SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT NEEDS WORK Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Conformance Parameterized Test requires a parameter labels Feb 9, 2024
@ArthurChapman
Copy link
Collaborator Author

@jhnwllr Could you check this TEST please? Is there an API that we can link to?

@chicoreus
Copy link
Collaborator

Is the spatial buffer dependent on the size of the country? Is the spatial buffer dependent on a combination of the size of the country and the resolution of the country shape spatial data?

@ArthurChapman
Copy link
Collaborator Author

The spatial buffer is set as a default - under Parameterized - people can put different value if they wish. 3000 meters thought to be a good value given work carried out by John Waller.

@jhnwllr replied separately as

I have now separated out PCL1 and ADM1 types into separate files.

"I use PCL1 as a politically neutral name for "countries".

So see this file I just generated for "countries".
https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv

There isn't yet an API endpoint which just lists the centroids GBIF is using, but you can use occurrence search to get a "list of the centroids with occurrences" so to speak.
https://www.gbif.org/occurrence/search?advanced=1&occurrence_status=present&distance_from_centroid_in_meters=0,0
https://www.gbif.org/api/occurrence/search?advanced=1&occurrence_status=present&distance_from_centroid_in_meters=0,0"

@ArthurChapman
Copy link
Collaborator Author

Source Authority and Notes updated following advice from @jhnwllr above.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 18, 2024

Is this now an Immature/Incomplete or something else? If the former, we need to start adding relevant Notes.

@ArthurChapman
Copy link
Collaborator Author

I think this is Supplementary - given that we do have a good SourceAuthority. ALthough there is not an API at the moment, the link that @jhnwllr is an alternative that should work.

@chicoreus
Copy link
Collaborator

Should be straightforward to implement without an API given https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv, ask if the coordinate is near one of the points given for the country code in that file.

Propose changing from:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:country as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

to:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or if dwc:geodeticDatum is not EPSG:4326; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

Remove country as an information element, just use dwc:countryCode as consulted.

Alternately:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

with a slightly larger spatial buffer to add in uncertainty from potential differences in the datum.

@chicoreus chicoreus added CORE TG2 CORE tests and removed Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Aug 20, 2024
@ArthurChapman
Copy link
Collaborator Author

Expected Response modified to cater for the possibility of more than one centroid, Specification Last Updated added, and Notes modified. Test made CORE rather than Supplementary as don't need an API, as we can use the file prepared by @jhnwllr

@chicoreus
Copy link
Collaborator

Per @tucotuco and @ymgan need to incorporate coordinateUncertaintyInMeters, as a point at the centroid with a coordinate uncertanty equivalent to the size of the country is reasonable and doesn't need to be identified as a potential issue.

chicoreus added a commit that referenced this issue Aug 20, 2024
…f-xml from the test specifications as of 2024-08-20 (AM) following discussions of issues in TG2 working meeting in Seattle. Adding #287 as core test.  Regenerating human readable markdown lists of tests.
@chicoreus
Copy link
Collaborator

Expected response doesn't quite read right in the bits about multiple possible centers.

Also needs to allow points centered on the country with a coordinateUnertaintyInMeters approximating the country, perhaps change from:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

To:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is less than half the square root of the area of the country; otherwise NOT_ISSUE.

Adding coordinateUncertaintyInMeters as an information element consulted.

We could be more general about the coordinateUncertaintyInMeters being large, e.g. "large relative to the size of the country" and put the half the square root of the area in the notes. Square root of the area of the country is available in the default source authority, and wouldn't force us to add a spatial source authority for country boundaries (we could do that and phrase a coordinate uncertainty in meters that is less than the radius of a circle that the country fits into (which could be precalculated from country shape data), Square root of the area is a simple pragmatic way to estimate a large uncertainty relative to the size of the country that would make the behavior of the test consistent across implementations, and is provided in the default source authority.

chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Aug 27, 2024
…uery Getty TGN, throwing source authority exceptions, more cleanup of handling source authorities with exceptions. Adding a stub for tdwg/bdq#287.
chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Aug 27, 2024
…troids PCLI file converted to a shape file. This was passing all but one row in @Tasilee's test validation data, but have also added the proposed test for large coordinate uncertainty relative to country size proposed as a change to the specification.
@Tasilee
Copy link
Collaborator

Tasilee commented Aug 28, 2024

I don't think that works @chicoreus. If dwc:coordinateUncertaintyInMeters is EMPTY then NOT_ISSUE as you have (1) and (2) for POTENTIAL_ISSUE?

@chicoreus
Copy link
Collaborator

@Tasilee good catch, needs explicit handling of empty for coordinateUncertaintyInMeters. How about:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE.

@ArthurChapman
Copy link
Collaborator Author

I think that works @chicoreus - but then I have just flown half way around the world and may have brain fog! Just thinking of the cases where one has a country (e.g. Australia or Nova Hollandia - quite common) and a center of the country is given. In that case the half the square root of the area of the country - Square Root of the area of Australia is ~2,782 km - that is greater than the distance from the center to any part of mainland Australia - it works. Chile, I'm not so sure though being long and thin!

@chicoreus
Copy link
Collaborator

@ArthurChapman in the PCLI country centroid data set, Chile has a area of about 736593 km², this would give a radius of 429 km, and the conclusion that a coordinate uncertainty in meters of larger than 429000 would be large relative to the country. That isn't being precise and asserting what coordinate uncertainty in meters would produce a circle that entirely encloses the country (for Chile, much of the country would be outside that circle), but it does feel like a good pragmatic estimator of uncertainties that are relatively large in comparison to the country. Alternative is to include another source authority for country shapes, and obtain values of radius of a circle that would contain the entire country from there, but then there will be uncertainties in how people representing uncertainties containing a country did so, and using the half square root of the area seems like a reasonable conservative estimator for large uncertainty relative to country size, which is, in essence, what we are trying to exclude from being flagged as potentially problematic here.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 28, 2024

It seems reasonable to improve the Expected Response from

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

to

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE.

NEEDS WORK??

@ArthurChapman
Copy link
Collaborator Author

I am happy with that. It will be interesting to see how it works in practice. Perhaps another, more complicated, way is to look at dwc:locality if it only contains a country name, but that would be difficult to work in practice. For example if the dwc:locality only said "Australia" or "Chile", but then you'd need to find all the synonyms "Nova Hollandia", etc. and country names at the time of the event and then use the centroid of those historical countries over time and that we don't have. It may be possible, but I think extremely difficult to do well.

I am happy to use the @Tasilee suggestion and see what feedback one gets over time.

@chicoreus
Copy link
Collaborator

I've added dwc:coordinateUncertaintyInMeters as an information element consulted for the new specification. I think we can take the needs work off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance CORE TG2 CORE tests Issue A potential issue Parameterized Test requires a parameter SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2
Projects
None yet
Development

No branches or pull requests

3 participants