-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datacube #27
Datacube #27
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks super neat, the datacube pipeline is coming in well.
datacube.py
Outdated
"args": [ | ||
{ | ||
"op": "s_intersects", | ||
"args": [{"property": "geometry"}, geom_CENTROID.__geo_interface__], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CENTROID
is a point object, may be we can use that directly instead of geom_CENTROID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Yes we can. Implemented.
datacube.py
Outdated
""" | ||
da_sen2: xr.DataArray = stackstac.stack( | ||
items=s2_items[0], | ||
epsg=26910, # UTM Zone 10N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we read the projection from the s2 item collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assigned this to the same projection used for the S2 item.
datacube.py
Outdated
da_sen1.attrs["spec"] = str(da_sen1.spec) | ||
|
||
# To fix ValueError: unable to infer dtype on variable None | ||
for key, val in da_sen1.coords.items(): | ||
if val.dtype == "object": | ||
print("Deleting", key) | ||
da_sen1 = da_sen1.drop_vars(names=key) | ||
|
||
# Create xarray.Dataset datacube with VH and VV channels from SAR | ||
da_vh: xr.DataArray = da_sen1.sel(band="vh", drop=True).rename("vh") | ||
da_vv: xr.DataArray = da_sen1.sel(band="vv", drop=True).rename("vv") | ||
ds_sen1: xr.Dataset = xr.merge(objects=[da_vh, da_vv], join="override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
da_sen1 = stackstac.mosaic(da_sen1, dim="time")
will create a composite of S1 tiles overlapping the S2 bbox. We can use that directly in the merge, and don't have to worry about pixel picking stratergy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Word. I like this. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So stackstac.mosaic
returns an xarray.DataArray
, whereas xarray.merge
returns an xarray.Dataset
. The main difference is that xarray.Dataset
allows us to clearly show named data variables (e.g. VV, VH, B1, B2, etc), whereas xarray.DataArray
stores these in the 'band' dimension. Do we have a preference for either format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having named data vars helps - stackstac.mosaic
takes care of merging the pixels along the given dim (time in our case) which we don't have to handle at our end.
What if we create the data arrays & then concat
them together as a dataset with vars attached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still retain the named data vars I think. Merged data array using mosaic method for Sentinel 1:
Merged datarray: <xarray.Dataset>
Dimensions: (time: 1, band: 14, x: 11365,
y: 11363)
Coordinates: (12/67)
* time (time) datetime64[ns] 2018-06...
* band (band) <U4 'B02' 'B03' ... 'vv'
* x (x) float64 2.973e+05 ... 4.1...
* y (y) float64 4.202e+06 ... 4.0...
id (time) <U54 'S2B_MSIL2A_20180...
s2:generation_time <U24 '2020-10-12T02:49:45.926Z'
... ...
sar:instrument_mode <U2 'IW'
sar:product_type <U3 'GRD'
s1:product_timeliness <U8 'Fast-24h'
s1:resolution <U4 'high'
description (band) object nan ... 'Terrai...
proj:shape object {3600}
Data variables:
stackstac-95bc7f7975c3ca621cbffb094f33316e (time, band, y, x) float32 dask.array<chunksize=(1, 1, 1024, 1024), meta=np.ndarray>
stackstac-80df347b660c0b771e7ba977f16777bc (band, y, x) float32 dask.array<chunksize=(13, 1024, 1024), meta=np.ndarray>
stackstac-602439b1960e897b8044c8056eb3ce96 (band, y, x) float32 dask.array<chunksize=(14, 1024, 1024), meta=np.ndarray>
Attributes:
spec: RasterSpec(epsg=32611, bounds=(297340, 4088320, 410990, 4201...
crs: epsg:32611
transform: | 10.00, 0.00, 297340.00|\n| 0.00,-10.00, 4201950.00|\n| 0.0...
resolution: 10
datacube.py
Outdated
# TODO: Find a way to get minimal number of S1 tiles to cover the entire S2 bbox | ||
s1_gdf["overlap"] = s1_gdf.intersection(box(*BBOX)).area | ||
s1_gdf.sort_values(by="overlap", inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pick tiles from ascending or descending capture based on their overlap with the S2 bbox.
state = (s1_gdf[["sat:orbit_state", "overlap"]]
.groupby(["sat:orbit_state"])
.sum()
.sort_values(by="overlap", ascending=False)
.index[0])
state = ascending OR descending
.
Filter the s1_gdf by state
: s1_gdf = s1_gdf[s1_gdf["sat:orbit_state"] == state]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented
8264159
to
2ccda53
Compare
d38efac
to
87f61f0
Compare
datacube.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have this file in the top-level directory, or in another folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think we can have this in a more appropriate subdir. I just put it here as we haven't decided on the structure yet. I'm happy to make the call and we can modify as we see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put it in a subdir, will leave you up to you on the naming 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think @yellowcap is putting things under scripts/
in #29, so we could follow that?
datacube.py
Outdated
return da_merge | ||
|
||
|
||
def process(year1, year2, aoi, resolution, epsg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about needing to pass in an EPSG code here. For wide areas (in the West-to-East direction) that can span multiple UTM zones, wouldn't this force a single EPSG instead of multiple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to overlap data from S1 & DEMs over S2 tiles. As S2 tiles are following the MGRS format, each tile will fall under a single UTM zone (this is my understanding, correct me if I am wrong).
S1 & DEMs might span across multiple UTM zones, but stackstac should reproject all of them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. I guess I should switch this to just read the projection based on the MGRS tile's UTM zone instead of have a user argument for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the EPSG code directly from the Sentinel-2 item properties now.
datacube.py
Outdated
s2_items, s1_items, dem_items, BBOX, resolution, epsg | ||
) | ||
|
||
da_merge = merge_datarrays(da_sen2, da_sen1, da_dem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a sanity check, could you plot each invidual band from this datacube and see if they look ok? I've had issues with striped outputs when running xr.merge
before on Sentinel-1 and Copernicus DEM, and just want to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be addressed in #27 (comment)
917da9b
to
fd23e2c
Compare
…d-free conditions and orbit state
0aa9fab
to
89aeef1
Compare
adf0146
to
81d1b02
Compare
…rs, mosaic s1 on time
2c34852
to
0c66e0f
Compare
7dd58be
to
b3b5581
Compare
Ok. As of now, we are able to confidently generate a datacube w S2, S1 and DEM bands concatenated. The time dimension is that of the "best" S2 image in terms of cloud coverage. The S1 images are all of the same orbit state. Some validation results: Some subset plots from each sensor: @weiji14 @srmsoumya if looks good to you both, can merge and move on to the tiling part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lilly, I think we should try to merge this by today. Just a few minor suggestions, and we can improve on stuff in follow-up PRs.
d8a2831
to
19d72a7
Compare
440e13e
to
2536b8b
Compare
cbad5b9
to
7a902e5
Compare
7e3de6f
to
74ccc40
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter is complaining a bit, but we can fix those later since the script will need some refining. Thanks again @lillythomas, and also @srmsoumya for reviewing!
This script collects overlapping Sentinel-1, Sentinel-2 and DEM for an AOI and time range, aligns to a common extent and merges the data arrays.
To do:
search_sentinel1_calc_max_area()
function