Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for sorting query results #157

Open
tomkralidis opened this issue Aug 30, 2018 · 14 comments
Open

add support for sorting query results #157

tomkralidis opened this issue Aug 30, 2018 · 14 comments
Labels
OGC API: Features Issue related to feature resources (see #190) Part 8: Sorting

Comments

@tomkralidis
Copy link
Contributor

Overview

Allow for sorting of query results against items endpoints.

In pygeoapi we are considering adopting the OGC CSW 2.0.2 implementation of sortby.

From Table 65:

List of Character String, comma separated.

Ordered list of names of metadata elements to use for sorting the response

Format of each list item is metadata_element_name:A indicating an ascending sort or metadata_ element_name:D indicating descending sort

From 10.8.4.12:

The result set may be sorted by specifying one or more metadata record elements upon
which to sort.
In KVP encoding, the SORTBY parameter is used to specify the list of sort elements.
The value for the SORTBY parameter is a comma-separated list of metadata record
element names upon which to sort the result set. The format for each element in the list
shall be either element name:A indicating that the element values should be sorted in
ascending order or element name:D indicating that the element values should be sorted in
descending order.

Examples:

  • sort by property title, ascending: sortby=title or sortby=title:A
  • sort by property country, descending: sortby=country:D
  • sort by property title, ascending, then property country, descending: sortby=title,country:D
@cportele cportele added the Future work support in an additional part of OGC API Features label Aug 30, 2018
@pvretano
Copy link
Contributor

pvretano commented Aug 30, 2018

@tomkralidis You should probably also review the sorting clause in the Filter encoding specification (http://docs.opengeospatial.org/is/09-026r2/09-026r2.html and https://portal.opengeospatial.org/files/?artifact_id=66226). I wrote both the CSW and FES sections on sorting so they should be fairly similar.

@tomkralidis
Copy link
Contributor Author

Thanks @pvretano. Looks like the main difference in HTTP GET context is that CSW's sort uses a colon to seperate the sort property from the sort order, whereas WFS uses a space.

I'd vote for the colon (or anything not a space).

@pvretano
Copy link
Contributor

@tomkralidis sure ... sounds OK to me.

@aaime
Copy link
Contributor

aaime commented Aug 31, 2018

Say one implements an INSPIRE app-schema or CityGML, and wants to sort on an attribute. That would require an xpath which uses colon. This is a problem IMHO, either consider a char escaping approach or quoting... This is needed whatever char is used for seoaratio, but maybe best use one very unlikely to appear in attribute references.

@rcoup
Copy link

rcoup commented Aug 31, 2018

I presume there's a reason, but why comma-separate attributes when repeating querystring/post parameters is perfectly find under HTTP?

Another concept I've seen a few places (Django uses):

  • &sortby=attr => ORDER BY attr
  • &sortby=-attr => ORDER BY attr DESC
  • &sortby=-attr&sortby=other&sortby=ns:afield => ORDER BY attr DESC, other, afield

Pretty unlikely attributes will start with -, especially given GML uses them as element names, and they're illegal for XML.

@nmtoken
Copy link

nmtoken commented Aug 31, 2018

just a nit pick, but in KVP, the & closes the pair, not opens it so:

  • sortby=attr => ORDER BY attr&

  • sortby=-attr => ORDER BY attr DESC&

  • sortby=-attr&sortby=other&sortby=ns:afield

=> ORDER BY attr DESC, other, afield&

@jampukka
Copy link
Contributor

jampukka commented Aug 31, 2018

@rcoup just have to be extra careful with repeating parameters (explode: true in OpenAPI terms) if the order is meaningful.

@cholmes
Copy link
Member

cholmes commented Aug 31, 2018

cc @hgs-truthe01

In STAC we just added sorting - currently on dev for 0.6.0, and I believe Tim has an implementation of it.
It's defined at:
https://github.com/radiantearth/stac-spec/blob/dev/api-spec/extensions/sorting.fragment.yaml

We'd be happy to align with WFS3, but are going to ship this first version pretty soon.

@cportele cportele added the OGC API: Features Issue related to feature resources (see #190) label Mar 5, 2019
@cportele
Copy link
Member

cportele commented May 8, 2019

The following is transferred from #23.


In WFS3 Core there's no way to specify the sorting order. Therefore paging is only really useful for "streaming" through the response in count-sized chunks. Access to to previous page(s) might be easier to implement in the UI application (you already had the information as there's no way to skip pages with sequential forward-only next links).

When sorting extension is added paging becomes much more meaningful. Then you can access the last page by flipping the sorting order and accessing the first page, so I'd still vote no for last link.

Originally posted by @jampukka in #23 (comment)


Without explicit sorting, does pagination have much meaning? Pagination of an unsorted (or at least, not-explicitly-sorted) result seems like a useful way to break up a request that would otherwise be too large for client or server, but random page access to unsorted records doesn't seem like a use case for anything apart from achieving these small sequential payloads. A client cannot sort without access to the full collection.

Given that sorting is not mentioned in the Core (I think; I've only just read the specification), is there a contract that a particular page has coherent content with its linked adjacent rel pages? (For the moment assuming no race-condition changes to the underlying dataset — which itself makes pagination problematic due to the potential presence of duplicates, etc.) Hypothetically (ignoring a caching layer), if you could retrieve an entire collection in a single request, must the result be identically sorted each time?

Sorting features of a collection by time (if temporal) and then secondarily by primary key seems like a suitable implementation for some servers and datasets, but obviously primary keys are not necessarily in any kind of semantic and/or alpha-numeric order so ideally function only as tie-breakers when sorting on some other property. Is time a default sort condition? And if so, what of features without temporal information: is it determined by their name? What of features with durations?

Being able to bypass a server's maximum limit #152 would be one workaround to the problem of pages omitting/duplicating features on different pages: a result is at least internally consistent at time of request. But it also has its own issues.

Another solution might be the ability to return a list of all pages upfront, thereby giving a client the opportunity to request pages in parallel rather than sequentially. This has advantages to the client beyond reducing the probability of data in a collection changing while paginating: including client-side latency, and UI advantages when aiming to render a First Previous 2 3 Current 5 6 Next Last pagination UI.

Hypothetically, pages could include some information about their range (e.g. temporal extent if relevant), which would provide additional benefits to both human and computer interaction. For large collections this is still probably not feasible, since it may not be able to efficiently compute where page boundaries lie in a sorted collection beyond previous/next relations. However, given that a collection's feature count is known at request, as is the page limit, a simple list of all pages seems feasible, though might fall into the realm of a client optimisation if pages can be reliably constructed using standard query parameters. (Which the existing spec says is not mandated.)

Pages themselves could even take on an explicit spatial property, perhaps if pages are organised (features are sorted) into some kind of tessellated grid, like DGGS—at this point perhaps a client would prefer vector tiles.

These are more rambling and obvious thoughts than coherent contributions. My point is really the same as @jampukka's:

  • Without explicit sorting, pagination is equivalent to chunking a large dataset.

To this I'd add:

  • In an unsorted collection, the client knows that it needs to request all pages before being able to perform many consequent tasks.
  • In a sorted collection, where the sort condition is known, a client may discover that it can stop making requests when its implicit search condition is met.
  • "Random" page access is useful with a sorted collection.

Originally posted by @alpha-beta-soup in #23 (comment)

@tomkralidis
Copy link
Contributor Author

Note relevant work in STAC: radiantearth/stac-spec#513 which could be of use.

@cportele
Copy link
Member

Sorting is currently specified by Records, see http://docs.opengeospatial.org/DRAFTS/20-004.html#clause-sorting.

This should eventually be moved to Common. Features can simply specify support for sorting by adding a requirements class that binds the sortby parameter to the Features resource.

@apfelnymous
Copy link

Any news on this ? Trying so sort data in the gml file doesn't seem appropriate.

@pvretano
Copy link
Contributor

@apfelnymous what specifically are you asking about?

If your question is about when sorting will be specified in the Features specifications then ...

Sorting is still on the roadmap to be moved from Records to Features (and eventually Common) but right now we are concentrating on finishing CQL2 and the other active Parts of the Features suite of specifications. For the time being, as @cportele mentioned above, if you have a sorting requirement then simply implement the sortBy query parameter at the /items endpoint as described in Records.

@apfelnymous
Copy link

apfelnymous commented Jul 19, 2023

@pvretano
A colleague asked me about sorting my features in my files to have them represented correctly in the collections view. As that approach seemed weird to me I was looking for a way to do that at service interface level.

@cportele cportele added Part 8: Sorting and removed Future work support in an additional part of OGC API Features labels Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OGC API: Features Issue related to feature resources (see #190) Part 8: Sorting
Projects
None yet
Development

No branches or pull requests

9 participants