Skip to content

Commit

Permalink
chore: Release 0.4.0 (#34)
Browse files Browse the repository at this point in the history
* feat(algorithm_inputs): Added 'doi' to algorithm inputs which users can specify

* chore(algorithm_inputs): Remove defaults, simplify process

* test: remove outdated doi conftest

* docs: Updated for DOI algorithm input

* feat: DOI algorithm input

* chore: Pull Request changes

* chore: minor PR changes

* docs: Updated for optional query input parameter

* test: Updated for optional query input parameter

* feat: Query input parameter now optional

* docs: Clarifying inputs

* feat: select columns from 2D datasets (#16)

Fixes: #5, #7

* fix: skip granule files that cannot be opened (#18)

Granule files that cannot be successfully read are skipped, rather than
causing job failure.  Offending files are retained to facilitate
analysis.

Fixes #17

* feat: lat lon algorithm inputs (#20)

* docs: lat/lon algorithm inputs additions

* test: lat/lon algorithm inputs additions

* feat: lat/lon algorithm inputs additions

* feat: support L1B and L2B collections (#21)

Fixes #19

* Prepare for 0.3.0 release

* Prepare for further development

* feat: user input to filter beams (#26)

* docs: User-supplied beams specification

* test: Testing various beams input

* feat: User-specified beams

* docs: updated beams documentation

* test: simple beams fail test

* test: check_beams_option

* docs: additional docstring changes

* test: additional tests ...

* fix: n_expected algorithm inputs (#27)

* fix: n_expected algorithm inputs (#33)

* chore: Release 0.4.0 (#28)

* chore: Release 0.4.0

* chore: minor changes for next release

* docs: changes for 0.4.0 release

* chore: Release 0.4.0

Co-authored-by: Chuck Daniels <[email protected]>
  • Loading branch information
jjfrench and chuckwondo authored Nov 14, 2022
1 parent 7ba904e commit 33138d6
Show file tree
Hide file tree
Showing 10 changed files with 277 additions and 59 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog], and this project adheres to
[Semantic Versioning].

## [0.4.0] - 2022-11-14

### Added
- [#6](https://github.com/MAAP-Project/gedi-subsetter/issues/6): Allow user to
specify which BEAMs to subset

## [0.3.0] - 2022-10-31

### Fixed
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,10 @@ must be supplied for every input):

- `lon`: Name of the dataset used for longitude.

- `beams`: Which beams to include in the subset. Must be `all`, `coverage`,
`power`, _OR_ a comma-separated list of beam names, with or without the `BEAM`
prefix (e.g., `BEAM0000,BEAM0001` or `0000,0001`)

- `columns`: Comma-separated list of column names to include in the output file.
These names correspond to the variables (layers) within the data files, and
vary from collection to collection. Consult the documentation for a list of
Expand Down Expand Up @@ -200,7 +204,7 @@ Here are some sample input values per DOI:
- **doi:** `L4A`, `l4a`, or a specific DOI name
- **lat**: `lat_lowestmode`
- **lon**: `lon_lowestmode`
- **columns:** `agbd, agbd_se, sensitivity, sensitivity_a2`
- **columns:** `agbd, agbd_se, sensitivity, geolocation/sensitivity_a2`
- **query:** ``l2_quality_flag == 1 & l4_quality_flag == 1 & sensitivity > 0.95 & `geolocation/sensitivity_a2` > 0.95``

## Running a GEDI Subsetting DPS Job
Expand All @@ -227,6 +231,7 @@ inputs = dict(
doi="<DOI>",
lat="<LATITUDE>",
lon="<LONGITUDE>",
beams="<BEAMS>",
columns="<COLUMNS>",
query="<QUERY>",
limit = 10_000
Expand Down
4 changes: 3 additions & 1 deletion algorithm_config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
description: Subset GEDI L1B, L2A, L2B, or L4A granules within an area of interest (AOI)
algo_name: gedi-subset
version: 0.3.0
version: 0.4.0
environment: ubuntu
repository_url: https://repo.ops.maap-project.org/data-team/gedi-subsetter.git
docker_url: mas.dit.maap-project.org/root/maap-workspaces/base_images/r:dit
Expand All @@ -17,6 +17,8 @@ inputs:
download: False
- name: lon
download: False
- name: beams
download: False
- name: columns
download: False
- name: query
Expand Down
37 changes: 12 additions & 25 deletions notebooks/GEDI_L4A_Subsetting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -161,25 +161,7 @@
"source": [
"## Submit a Job\n",
"\n",
"When supplying input values for a GEDI subsetting job, to use the default value\n",
"for a field (where indicated), use a dash (`\"-\"`) as the input value.\n",
"\n",
"- `aoi` (required): URL to a GeoJSON file representing your area of interest,\n",
" as explained above.\n",
"\n",
"- `columns`: Comma-separated list of column names to include in the output file.\n",
" (Default: `\"agbd, agbd_se, l2_quality_flag, l4_quality_flag, sensitivity, sensitivity_a2\"`)\n",
"\n",
"- `query`: Query expression for subsetting the rows in the output file.\n",
" (Default: `\"l2_quality_flag == 1 and l4_quality_flag == 1 and sensitivity > 0.95 and sensitivity_a2 > 0.95\"`)\n",
"\n",
" **IMPORTANT**: The `columns` input must contain at least all of the columns\n",
" that appear in this `query` expression, otherwise an error will occur.\n",
"\n",
"- `limit`: Maximum number of GEDI granule data files to download (among those\n",
" that intersect the specified AOI). (Default: 10000)\n",
"\n",
"It is recommended to use `maap-dps-worker-16gb` or `maap-dps-worker-32gb` queues when submitting a job with a large aoi."
"See README.md for documentation regarding the inputs"
]
},
{
Expand All @@ -191,22 +173,26 @@
"source": [
"inputs = dict(\n",
" aoi=aoi,\n",
" columns=\"-\",\n",
" query=\"-\",\n",
" limit=\"-\",\n",
" doi=\"L4A\",\n",
" lat=\"lat_lowestmode\",\n",
" lon=\"lon_lowestmode\",\n",
" beams=\"coverage\",\n",
" columns=\"agbd, agbd_se, sensitivity, geolocation/sensitivity_a2\",\n",
" query=\"l2_quality_flag == 1 and l4_quality_flag == 1 and sensitivity > 0.95 and `geolocation/sensitivity_a2` > 0.95\",\n",
" limit=10_000,\n",
")\n",
"\n",
"result = maap.submitJob(\n",
" identifier=\"gedi-subset\",\n",
" algo_id=\"gedi-subset_ubuntu\",\n",
" version=\"gedi-subset-0.2.7\",\n",
" queue=\"maap-dps-worker-8gb\",\n",
" version=\"0.4.0\",\n",
" queue=\"maap-dps-worker-32gb\",\n",
" username=username,\n",
" **inputs,\n",
")\n",
"\n",
"job_id = result[\"job_id\"]\n",
"job_id"
"job_id or result"
]
},
{
Expand Down Expand Up @@ -342,6 +328,7 @@
" )\n",
"else:\n",
" gedi_gdf = gpd.read_file(output_file)\n",
" print(gedi_gdf.head())\n",
" agbd_colors = plt.cm.get_cmap(\"viridis_r\")\n",
" gedi_gdf.plot(column=\"agbd\", cmap=agbd_colors)"
]
Expand Down
84 changes: 66 additions & 18 deletions src/gedi_subset/gedi_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import os.path
import warnings
from typing import Any, Mapping, Optional, Sequence, Union
from typing import Any, Callable, Mapping, Optional, Sequence, Union

import h5py
import numpy as np
Expand All @@ -15,6 +15,7 @@
from shapely.geometry import Polygon
from shapely.geometry.base import BaseGeometry

import gedi_subset.fp as fp
from gedi_subset.h5frame import H5DataFrame

# Suppress UserWarning: The Shapely GEOS version (3.10.2-CAPI-1.16.0) is incompatible
Expand Down Expand Up @@ -124,11 +125,28 @@ def spatial_filter(beam, aoi):
return indices


def is_coverage_beam(beam: h5py.Group) -> bool:
return "COVERAGE" in beam.attrs.get("description", "").upper()


def is_power_beam(beam: h5py.Group) -> bool:
return "POWER" in beam.attrs.get("description", "").upper()


def beam_filter_from_names(names: Sequence[str]):
def is_named_beam(beam: h5py.Group) -> bool:
return any(name.upper() in beam.name.upper() for name in names)

return is_named_beam


def subset_hdf5(
hdf5: h5py.Group,
*,
aoi: gpd.GeoDataFrame,
lat: str,
lon: str,
beam_filter: Callable[[h5py.Group], bool] = fp.always(True),
columns: Sequence[str],
query: Optional[str],
) -> gpd.GeoDataFrame:
Expand All @@ -139,7 +157,8 @@ def subset_hdf5(
that fall within the specified area of interest (AOI) and also satisfy the specified
query criteria. The resulting ``geopandas.GeoDataFrame`` is further reduced to
include only the specified columns, which must be names of datasets within the
HDF5 group (specifically, datasets within subgroups named with the prefix `"BEAM"`).
HDF5 group (specifically, datasets within subgroups named with the prefix `"BEAM"`
for which invocation of the specified ``beam_filter`` callable returns ``True``).
To illustrate, assume an HDF5 file (`hdf5`) structured like so (values are for
illustration purposes only):
Expand Down Expand Up @@ -181,9 +200,10 @@ def subset_hdf5(
Assumptions:
- The HDF5 group/file contains subgroups that are named with the prefix `"BEAM"`.
- Every `"BEAM*"` subgroup contains datasets named `lat_lowestmode` and
`lon_lowestmode`, representing the latitude and longitude, respectively, which are
used for the geometry of the resulting ``GeoDataFrame``.
- Every `"BEAM*"` subgroup contains degree unit datasets with names given by the
specified ``lat`` and ``lon`` parameters, representing the latitude and longitude,
respectively, used to create the ``geometry`` column of the resulting
``GeoDataFrame``.
- For every column name in `columns` and every column name appearing in the `query`
expression, every `"BEAM*"` subgroup contains a dataset of the same name.
Expand All @@ -198,8 +218,21 @@ def subset_hdf5(
HDF5 group to subset (typically an ``h5py.File`` instance).
aoi : gpd.GeoDataFrame
Area of Interest. The subset is limited to data points that fall within this
area of interest, as determined by the `lat_lowestmode` and `lon_lowestmode`
datasets of each `"BEAM*"` group within the HDF5 file.
area of interest, as determined by the latitude and longitude datasets of each
`"BEAM*"` group within the HDF5 file.
lat: str
Name of the latitude dataset used for the resulting ``GeoDataFrame`` geometry.
lon: str
Name of the longitude dataset used for the resulting ``GeoDataFrame`` geometry.
beam_filter: Callable[[h5py.Group], bool] = fp.always(True)
Callable used to determine whether or not a top-level BEAM subgroup within the
specified ``hdf5`` group should be included in the subset. This callable is
called once for each subgroup that has a name prefixed with `"BEAM"`. If not
supplied, the default callable always returns ``True``, such that every
``"BEAM*"`` subgroup is included. For convenience, the predicate functions
py:`is_coverage_beam` and py:`is_power_beam` may be used. Further, the function
returned by calling py:`beam_filter_from_names` with a specific list of BEAM
names may be used.
columns : Sequence[str]
Column names to be included in the subset. The specified column names must
match dataset names within the `"BEAM*"` groups of the HDF5 file. Although the
Expand All @@ -219,7 +252,10 @@ def subset_hdf5(
GeoDataFrame containing the subset of the data from the HDF5 group/file that
fall within the specified area of interest and satisfy the specified query.
Columns are limited to the specified sequence of column names, along with
`filename` (str) and `BEAM` (str) columns.
`filename` (str) and `BEAM` (str) columns. Further, the query is applied to, and
the columns are selected from, only the top-level subgroups that have names
prefixed with ``"BEAM"`` and for which the ``beam_filter`` function returns
``True``.
Examples
--------
Expand All @@ -236,12 +272,14 @@ def subset_hdf5(
... group.create_dataset("lat_lowestmode", data=[-1.82556, -9.82514, -1.82471])
... group.create_dataset("lon_lowestmode", data=[12.06648, 12.06678, 12.06707])
... group.create_dataset("sensitivity", data=[0.9, 0.97, 0.99])
... group = hdf5.create_group("BEAM0001")
... group.attrs.create("description", "Coverage beam")
... group = hdf5.create_group("BEAM1011")
... group.create_dataset("agbd", data=[1.1715966, 1.630395, 3.5265787])
... group.create_dataset("l2_quality_flag", data=[0, 1, 1], dtype="i1")
... group.create_dataset("lat_lowestmode", data=[-1.82557, -9.82515, -1.82472])
... group.create_dataset("lon_lowestmode", data=[12.06649, 12.06679, 12.06708])
... group.create_dataset("sensitivity", data=[0.93, 0.96, 0.98])
... group.attrs.create("description", "Full power beam")
<HDF5 dataset "agbd": ...>
<HDF5 dataset "l2_quality_flag": ...>
<HDF5 dataset "lat_lowestmode": ...>
Expand Down Expand Up @@ -292,26 +330,32 @@ def subset_hdf5(
... }}])
We can now subset the data in the HDF5 file to points that fall within the AOI,
selecting only the desired columns (i.e., named datasets within the HDF5 file), and
selecting only the rows that satisfy the specified query:
selecting only the desired columns (i.e., named datasets within the HDF5 file),
selecting only the coverage beams, and selecting only the rows that satisfy
the specified query:
>>> with h5py.File(bio) as hdf5:
... gdf = subset_hdf5(
... hdf5, aoi, ["agbd", "sensitivity"],
... "l2_quality_flag == 1 and sensitivity > 0.95"
... hdf5,
... aoi=aoi,
... lat="lat_lowestmode",
... lon="lon_lowestmode",
... beam_filter=is_coverage_beam,
... columns=["agbd", "sensitivity"],
... query="l2_quality_flag == 1 and sensitivity > 0.95"
... )
... # Since the source of our HDF5 file is an ``io.BytesIO``, we'll drop the
... # `filename` column (which refers to the memory location of the
... # ``io.BytesIO``, not a filename).
... gdf.drop(columns=["filename"])
BEAM agbd sensitivity geometry
0 0000 1.116093 0.99 POINT (12.06707 -1.82471)
1 0001 3.526579 0.98 POINT (12.06708 -1.82472)
Note that the resulting ``geopandas.GeoDataFrame`` contains only the specified
columns (`agbd` and `sensitivity`), and only the rows (only 1 from each "beam" in
this example) that have a geometry that falls within the AOI and also satisfy the
query (i.e., `l2_quality_flag == 1` and `sensitivity > 0.95`).
coverage `BEAM`s, specified columns (`agbd` and `sensitivity`), and only the
rows (only 1 from each "beam" in this example) that have a geometry that falls
within the AOI and also satisfy the query
(i.e., `l2_quality_flag == 1` and `sensitivity > 0.95`).
Note also that although the `l2_quality_flag` was specified in the query, it does
not appear in the result because it was not specified in the sequence of column
Expand All @@ -336,7 +380,11 @@ def subset_beam(beam: h5py.Group) -> gpd.GeoDataFrame:
# Clip subset to the area of interest
return gpd.clip(gdf, aoi.set_crs(epsg=4326))

beams = (group for name, group in hdf5.items() if name.startswith("BEAM"))
beams = (
group
for name, group in hdf5.items()
if name.startswith("BEAM") and beam_filter(group)
)
beams_gdf = pd.concat(map(subset_beam, beams), ignore_index=True, copy=False)
beams_gdf.insert(0, "filename", os.path.basename(hdf5.file.filename))

Expand Down
Loading

0 comments on commit 33138d6

Please sign in to comment.