Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-based version of MERRA data #79

Open
cole-brokamp opened this issue Oct 16, 2024 · 2 comments
Open

cloud-based version of MERRA data #79

cole-brokamp opened this issue Oct 16, 2024 · 2 comments

Comments

@cole-brokamp
Copy link
Member

would eliminate the need for hosting processed data on github for any of the geomarker assessment functions

opendap? cog on s3?

@cole-brokamp
Copy link
Member Author

COG proof of concept

I put a previously-created cloud-optimized GeoTiff up in a harvard dataverse repository.

This code reads data from this COG without authentication

server_url <- "https://dataverse.harvard.edu"
persistent_id <- "doi:10.7910/DVN/EI64YD"
version <- "1.0"
filename <- "nlcd_imperviousdesc_2019.tif"

library(httr2)

req <-
  httr2::request(server_url) |>
  httr2::req_user_agent("geomarker-io (https://github.com/geomarker-io)") |>
  httr2::req_url_path_append("api", "datasets", ":persistentId", "versions", version) |>
  httr2::req_url_query("persistentId" = persistent_id)

resp <- httr2::req_perform(req)

file_names <-
  httr2::resp_body_json(resp)$data$files |>
  lapply(\(.) .$dataFile[["filename"]])

file_ids <-
  httr2::resp_body_json(resp)$data$files |>
  lapply(\(.) .$dataFile[["id"]])

file_id <- file_ids[[which(file_names == filename)[1]]]

file_uri <- glue::glue("/vsicurl/https://dataverse.harvard.edu/api/access/datafile/{file_id}")

r <- terra::rast(file_uri)

query_coords <-
  data.frame(lon = c(-84.5175819, -84.52, -100.0131), lat = c(39.1408017, 39.15, 41.3981)) |>
  sf::st_as_sf(coords = c("lon", "lat"), crs = 4326) |>
  sf::st_transform(terra::crs(r)) |>
  terra::vect()

terra::extract(r, query_coords, ID = FALSE, layer = "Layer_1")

Instead of using data from "source" as an option, provide option to download entire file locally instead of using HTTP range requests.

Write functions to create COGs for different vintages of datasets and then organize them into different Dataverse groups / repositories / versions?

@cole-brokamp
Copy link
Member Author

cole-brokamp commented Oct 22, 2024

Create script to transform raster data into COG and release to DV:

  • NLCD
  • gridmet
  • NARR
  • create merra subset rasters from nc4 files
  • elevation

e.g., gdal_translate in.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=LZW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant