Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first version of tile retriever. #11

Closed
wants to merge 1 commit into from
Closed

Conversation

yellowcap
Copy link
Member

A first working version of a tiler that will create tiles over a scene based on location and dates. The tiles are in proper numpy format and each has metadata like bounds, centroid, and resolution.

This is a basis for discussion, but I am quite happy about the tiling overall and the approach. This should be very scalable, and can be switched to a different STAC provider very easily.

Refs #10

Download the data.
"""
stack = stackstac.stack(
items, resolution=RESOLUTION, assets=BANDS, dtype="uint16", fill_value=NODATA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure about setting the default nodata value to 0? I know uint16 only allows 0-65536, but 0 can also mean black.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always used 0 as nodata considered nodata, a real reflectance of 0 is probably wrong anyway, there is always some light scattered back. Althought with the new processing baseline things have become a bit more intricate, see
https://forum.step.esa.int/t/info-introduction-of-additional-radiometric-offset-in-pb04-00-products/35431/8

print(f"Storing {len(tiles)} tiles")
# TODO: Make this an upload to S3.
numpy.savez_compressed(
f"/datadisk/clay/{stack.id.to_numpy()[0]}.npz",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the option to set the folder path? And also create the folder if it doesn't already exist.

Comment on lines +88 to +90
stack = stackstac.stack(
items, resolution=RESOLUTION, assets=BANDS, dtype="uint16", fill_value=NODATA
)
Copy link
Contributor

@weiji14 weiji14 Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion on optimizing reads at different resolutions: Instead of resampling to certain resolution, it will be faster (tens-of-miliseconds instead of ~1min) to read from the overviews (see gjoseph92/stackstac#196 (comment)), but this will result in resolutions that are not round integer numbers.

Any thoughts on sticking to fixed resolutions like 10, 20, 60 versus overview-level defined resolutions? I know we discussed offline about aligning the 512x512 chips to what the Cloud-Optimized GeoTIFF is using internally for faster reads.

Side note: The default resampling with stackstac is Nearest Neighbour (see https://stackstac.readthedocs.io/en/v0.5.0/api/main/stackstac.stack.html#stackstac.stack.params.resampling), which is ok for optical images. It might be good to explicitly set the resampling algorithm in the code to be clearer (also in case anyone copies this code for another dataset such as DEMs which should use another interpolation scheme).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going for 10m so no overviews necessary, as we stick to highest resolution available. If we decide to use 20m then yes we should try to optimize for the 20m overview if they exist.

@yellowcap
Copy link
Member Author

Closing in favor of #27

@yellowcap yellowcap closed this Nov 10, 2023
@yellowcap yellowcap deleted the tile-retriever branch January 16, 2024 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline Pull Requests about the data pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants