Skip to content

Commit

Permalink
feat: alpha support for major-tom-core
Browse files Browse the repository at this point in the history
  • Loading branch information
kai-tub committed Aug 30, 2024
1 parent 2ffc2b5 commit 1d36352
Show file tree
Hide file tree
Showing 38 changed files with 447 additions and 36 deletions.
126 changes: 119 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,14 @@ The docker image can be used to run it on other operating systems:
### Supported Remote Sensing Datasets

Currently, `rico-hdl` supports:
- [BigEarthNet-S1 v2.0][ben]
- [BigEarthNet-S2 v2.0][ben]
- [BigEarthNet-MM v2.0][ben]
- [HySpecNet-11k][hyspecnet]
- [UC Merced Land Use][ucmerced]
- [EuroSAT][euro]
- [SSL4EO-S12][ssl4eo-s12]
- [BigEarthNet-S1 v2.0](#bigearthnet-example)
- [BigEarthNet-S2 v2.0](#bigearthnet-example)
- [BigEarthNet-MM v2.0](#bigearthnet-example)
- [HySpecNet-11k](#hyspecnet-11k-example)
- [UC Merced Land Use](#uc-merced-land-use-example)
- [EuroSAT](#eurosat-example)
- [SSL4EO-S12](#ssl4eo-s12-example)
- [Major-Tom-Core](#major-tom-core-example)

Additional datasets will be added in the near future.

Expand Down Expand Up @@ -683,6 +684,116 @@ rgb_tensor = np.stack([safetensor_dict[b] for b in rgb_bands])
assert rgb_tensor.shape == (3, 264, 264)
```

## [Major-TOM-Core][major-tom] Example

First, [download the rico-hdl](#Download) binary and install
the Python [lmdb][pyl] and [saftensors][pys] packages.
Then, to convert the Sentinel-1 and Sentinel-2 patches from the [Major-TOM-Core][major-tom]
dataset into the optimized format, call the application with:

```bash
rico-hdl major-tom-core --s1-dir <S1_ROOT_DIR> --s2-dir <S2_ROOT_DIR> --target-dir encoded-major-tom
```

In Major-TOM-Core, each band is stored as a separate file with the associate band as the name (`B01.tif`, `B12.tif`, `vv.tif`, ...).
The directory that contains the bands is the associated product id/patch and
is uniquely identifiable if it is combined with the associated grid cell id (parent directory).
The encoder groups all unique patches (`<grid_cell>_<product_id>`) and stores the data as a [safetensors][s] dictionary,
where the dictionary's key is the band name (`B01`, `B12`, `vv`, ...).

> [!NOTE]
> The encoder will _not_ encode the `thumbnail.png` nor the `cloud_mask.tif` band!
<details>
<summary>Example Input</summary>

```
├── <S1_ROOT_DIR>
│ └── 897U
│ └── 897U_171R
│ └── S1B_IW_GRDH_1SDV_20210827T012624_20210827T012653_028425_036437_rtc
│ ├── thumbnail.png
│ ├── vh.tif
│ └── vv.tif
└── <S2_ROOT_DIR>
└── 199U
└── 199U_1099R
└── S2B_MSIL2A_20200223T032739_N9999_R018_T48QUE_20230924T183543
├── B01.tif
├── B02.tif
├── B03.tif
├── B04.tif
├── B05.tif
├── B06.tif
├── B07.tif
├── B08.tif
├── B09.tif
├── B8A.tif
├── B11.tif
├── B12.tif
├── cloud_mask.tif
└── thumbnail.png
```

</details>

<details>
<summary>LMDB Result</summary>

```
'897U_171R_S1B_IW_GRDH_1SDV_20210827T012624_20210827T012653_028425_036437_rtc':
{
'vh': <1068x1068 float32 safetensors image data>
'vv': <1068x1068 float32 safetensors image data>
},
'199U_1099R_S2A_MSIL2A_20170613T101031_N9999_R022_T33UUP':
{
'B01': <178x178 uint16 safetensors image data>,
'B02': <1068x1068 uint16 safetensors image data>,
'B03': <1068x1068 uint16 safetensors image data>,
'B04': <1068x1068 uint16 safetensors image data>,
'B05': <534x534 uint16 safetensors image data>,
'B06': <534x534 uint16 safetensors image data>,
'B07': <534x534 uint16 safetensors image data>,
'B08': <1068x1068 uint16 safetensors image data>,
'B8A': <534x534 uint16 safetensors image data>,
'B09': <178x178 uint16 safetensors image data>,
'B11': <534x534 uint16 safetensors image data>,
'B12': <534x534 uint16 safetensors image data>,
}
```

</details>

The following code shows how to access the converted database:

```python
import lmdb
# import desired deep-learning library:
# numpy, torch, tensorflow, paddle, flax, mlx
from safetensors.numpy import load
from pathlib import Path

# path to the encoded dataset/output of rico-hdl
encoded_path = Path("./encoded-major-tom")

# Make sure to only open the environment once
# and not everytime an item is accessed.
env = lmdb.open(str(encoded_path), readonly=True)

with env.begin() as txn:
# string encoding is required to map the string to an LMDB key
safetensor_dict = load(txn.get("199U_1099R_S2A_MSIL2A_20170613T101031_N9999_R022_T33UUP".encode()))

rgb_bands = ["B04", "B03", "B02"]
rgb_tensor = np.stack([safetensor_dict[b] for b in rgb_bands])
assert rgb_tensor.shape == (3, 1068, 1068)
```


> [!TIP]
> Remember to use the appropriate `load` function for a given deep-learning library.

## Design

Expand Down Expand Up @@ -747,3 +858,4 @@ If you use this work, please cite:
[ucmerced]: http://weegee.vision.ucmerced.edu/datasets/landuse.html
[euro]: https://zenodo.org/records/7711810
[ssl4eo-s12]: https://github.com/zhu-xlab/SSL4EO-S12
[major-tom]: https://github.com/ESA-PhiLab/Major-TOM
4 changes: 4 additions & 0 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@
export PATH="$out/bin:$PATH"
export RICO_HDL_S1_PATH=${./integration_tests/tiffs/BigEarthNet/BigEarthNet-S1}
export RICO_HDL_S2_PATH=${./integration_tests/tiffs/BigEarthNet/BigEarthNet-S2}
export RICO_HDL_MAJOR_TOM_CORE_S1_PATH=${./integration_tests/tiffs/Major-TOM-Core/S1RTC}
export RICO_HDL_MAJOR_TOM_CORE_S2_PATH=${./integration_tests/tiffs/Major-TOM-Core/S2L2A}
export RICO_HDL_SSL4EO_S12_S1_PATH=${./integration_tests/tiffs/SSL4EO-S12}/s1
export RICO_HDL_SSL4EO_S12_S2_L1C_PATH=${./integration_tests/tiffs/SSL4EO-S12}/s2c
export RICO_HDL_SSL4EO_S12_S2_L2A_PATH=${./integration_tests/tiffs/SSL4EO-S12}/s2a
Expand Down Expand Up @@ -142,6 +144,8 @@
env.RICO_HDL_SSL4EO_S12_S1_PATH = "${config.env.DEVENV_ROOT}/integration_tests/tiffs/SSL4EO-S12/s1";
env.RICO_HDL_SSL4EO_S12_S2_L1C_PATH = "${config.env.DEVENV_ROOT}/integration_tests/tiffs/SSL4EO-S12/s2c";
env.RICO_HDL_SSL4EO_S12_S2_L2A_PATH = "${config.env.DEVENV_ROOT}/integration_tests/tiffs/SSL4EO-S12/s2a";
env.RICO_HDL_MAJOR_TOM_CORE_S1_PATH = "${config.env.DEVENV_ROOT}/integration_tests/tiffs/Major-TOM-Core/S1RTC/";
env.RICO_HDL_MAJOR_TOM_CORE_S2_PATH = "${config.env.DEVENV_ROOT}/integration_tests/tiffs/Major-TOM-Core/S2L2A/";
packages =
[
(mkPoetryEnv
Expand Down
114 changes: 114 additions & 0 deletions integration_tests/test_python_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,48 @@ def eurosat_ms_root() -> Path:
return p


@pytest.fixture(scope="session")
def major_tom_core_s1_root() -> Path:
str_p = (
os.environ.get("RICO_HDL_MAJOR_TOM_CORE_S1_PATH")
or "./tiffs/Major-TOM-Core/S1RTC/"
)
p = Path(str_p)
assert p.exists()
assert p.is_dir()
return p


@pytest.fixture(scope="session")
def major_tom_core_s2_root() -> Path:
str_p = (
os.environ.get("RICO_HDL_MAJOR_TOM_CORE_S2_PATH")
or "./tiffs/Major-TOM-Core/S2L2A/"
)
p = Path(str_p)
assert p.exists()
assert p.is_dir()
return p


@pytest.fixture
def encoded_major_tom_core_path(
major_tom_core_s1_root, major_tom_core_s2_root, tmpdir_factory
) -> Path:
tmp_path = tmpdir_factory.mktemp("lmdb")
subprocess.run(
[
"rico-hdl",
"major-tom-core",
f"--s1-dir={major_tom_core_s1_root}",
f"--s2-dir={major_tom_core_s2_root}",
f"--target-dir={tmp_path}",
],
check=True,
)
return Path(tmp_path)


# https://docs.pytest.org/en/6.2.x/tmpdir.html#[email protected](scope="session")
@pytest.fixture
def encoded_bigearthnet_s1_s2_path(
Expand Down Expand Up @@ -306,6 +348,78 @@ def test_bigearthnet_integration(
)


def test_major_tom_core_integration(
major_tom_core_s1_root, major_tom_core_s2_root, encoded_major_tom_core_path
):
env = lmdb.open(str(encoded_major_tom_core_path), readonly=True)

with env.begin(write=False) as txn:
cur = txn.cursor()
decoded_lmdb_data = {k.decode("utf-8"): load(v) for (k, v) in cur}

assert decoded_lmdb_data.keys() == set(
[
"0U_199R_S1A_IW_GRDH_1SDV_20220703T043413_20220703T043438_043931_053E87_rtc",
"897U_171R_S1B_IW_GRDH_1SDV_20210827T012624_20210827T012653_028425_036437_rtc",
"0U_199R_S2A_MSIL2A_20220706T085611_N0400_R007_T33NZA_20220706T153419",
"199U_1099R_S2B_MSIL2A_20200223T032739_N9999_R018_T48QUE_20230924T183543",
]
)

sample_s1_safetensors_dict = decoded_lmdb_data.get(
"0U_199R_S1A_IW_GRDH_1SDV_20220703T043413_20220703T043438_043931_053E87_rtc",
)
sample_s2_safetensors_dict = decoded_lmdb_data.get(
"0U_199R_S2A_MSIL2A_20220706T085611_N0400_R007_T33NZA_20220706T153419"
)
safetensors_s1_keys = sample_s1_safetensors_dict.keys()
safetensors_s2_keys = sample_s2_safetensors_dict.keys()
assert (
set(
[
"B01",
"B02",
"B03",
"B04",
"B05",
"B06",
"B07",
"B08",
"B8A",
"B09",
"B11",
"B12",
]
)
== safetensors_s2_keys
)
assert (
set(
[
"vv",
"vh",
]
)
== safetensors_s1_keys
)

assert all(arr.shape == (1068, 1068) for arr in sample_s1_safetensors_dict.values())
assert all(arr.dtype == "float32" for arr in sample_s1_safetensors_dict.values())

assert all(arr.dtype == "uint16" for arr in sample_s2_safetensors_dict.values())
assert all(
sample_s2_safetensors_dict[key].shape == (1068, 1068)
for key in ["B02", "B03", "B04", "B08"]
)
assert all(
sample_s2_safetensors_dict[key].shape == (534, 534)
for key in ["B05", "B06", "B07", "B8A", "B11", "B12"]
)
assert all(
sample_s2_safetensors_dict[key].shape == (178, 178) for key in ["B01", "B09"]
)


def test_ssl4eo_s12_integration(
ssl4eo_s12_s1_root,
ssl4eo_s12_s2_l1c_root,
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 1d36352

Please sign in to comment.