Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow requests when trying to access a large volume of data #8

Open
salvafern opened this issue Jun 15, 2020 · 1 comment
Open

Slow requests when trying to access a large volume of data #8

salvafern opened this issue Jun 15, 2020 · 1 comment

Comments

@salvafern
Copy link
Collaborator

salvafern commented Jun 15, 2020

I am currently trying to use the eurobis R library to download all records of benthic animals from the North Sea. Before, I used the robis package to download all data from a bounding box. Now, I have defined North Sea as area, and asked for all data with traits=benthos. I copied the url from the interface and used that in the eurobis package.

What I notice is that the response is very slow. In one hour, I got approximately 80000 records (of 1 million). So this may take a lot more time. Is that due to heavy traffic on the server, or has it to do with the eurobis package?

This is a large amount of data so it is indeed reasonable that it will take a long time. We should however check how we can improve the way the data is downloaded in this package.


Edit: here is the request performed

library("eurobis")
quer_ben_ns<-"http://geo.vliz.be/geoserver/wfs/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=Dataportal%3Aeurobis-obisenv&resultType=results&viewParams=where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+measurement_type_group_ids+%26%26+ARRAY%5B3%5C%2C14%5C%2C1%5C%2C28%5C%2C27%5D+AND+aphiaid+IN+%28+SELECT+aphiaid+FROM+eurobis.taxa_attributes+WHERE+selectid+IN+%28%27Benthos%27%29%29%3Bcontext%3A0100&propertyName=datasetid%2Cdatecollected%2Cdecimallatitude%2Cdecimallongitude%2Ccoordinateuncertaintyinmeters%2Cscientificname%2Caphiaid%2Cscientificnameaccepted%2Cmodified%2Cinstitutioncode%2Ccollectioncode%2Cyearcollected%2Cstartyearcollected%2Cendyearcollected%2Cmonthcollected%2Cstartmonthcollected%2Cendmonthcollected%2Cdaycollected%2Cstartdaycollected%2Cenddaycollected%2Cseasoncollected%2Ctimeofday%2Cstarttimeofday%2Cendtimeofday%2Ctimezone%2Cwaterbody%2Ccountry%2Cstateprovince%2Ccounty%2Crecordnumber%2Cfieldnumber%2Cstartdecimallongitude%2Cenddecimallongitude%2Cstartdecimallatitude%2Cenddecimallatitude%2Cgeoreferenceprotocol%2Cminimumdepthinmeters%2Cmaximumdepthinmeters%2Coccurrenceid%2Cscientificnameauthorship%2Cscientificnameid%2Ctaxonrank%2Ckingdom%2Cphylum%2Cclass%2Corder%2Cfamily%2Cgenus%2Csubgenus%2Cspecificepithet%2Cinfraspecificepithet%2Caphiaidaccepted%2Coccurrenceremarks%2Cbasisofrecord%2Ctypestatus%2Ccatalognumber%2Creferences%2Crecordedby%2Cidentifiedby%2Cyearidentified%2Cmonthidentified%2Cdayidentified%2Cpreparations%2Csamplingeffort%2Csamplingprotocol%2Cqc%2Ceventid%2Cparameter%2Cparameter_value%2Cparameter_group_id%2Cparameter_measurementtypeid%2Cparameter_bodcterm%2Cparameter_bodcterm_definition%2Cparameter_standardunit%2Cparameter_standardunitid%2Cparameter_imisdasid%2Cparameter_ipturl%2Cparameter_original_measurement_type%2Cparameter_original_measurement_unit%2Cparameter_conversion_factor_to_standard_unit%2Cevent%2Cevent_type%2Cevent_type_id&outputFormat=csv"
tt<-getEurobisData(geourl = quer_ben_ns)
@salvafern
Copy link
Collaborator Author

salvafern commented Mar 24, 2022

Potential solutions are (non-exclusive):

  1. Cache: Data are downloaded to disk instead of reading into memory. E.g. use httr::GET as in osmextract: https://github.com/ropensci/osmextract/blob/master/R/download.R#L134 or explore in detail httr2::req_cache()
  2. Pagination: WFS allows for pagination, so the data could be downloaded in batches and written into disk.
  3. Compressed data: We can look into all the possible outputs for vector data of WFS to get the data as compressed as possible, and then decompressed locally. There are geoserver extensions to get other formats that might be more compressed or more interesting for requesting and aggregating large volumes of data: https://docs.geoserver.org/latest/en/user/extensions/index.html

@salvafern salvafern mentioned this issue Mar 24, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant