Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading in a lazy DiskArrays.ConcatDiskArray is slow #717

Open
tiemvanderdeure opened this issue Aug 22, 2024 · 5 comments
Open

reading in a lazy DiskArrays.ConcatDiskArray is slow #717

tiemvanderdeure opened this issue Aug 22, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@tiemvanderdeure
Copy link
Contributor

tiemvanderdeure commented Aug 22, 2024

MWE:

using Rasters, RasterDataSources
ser =  RasterSeries(WorldClim{Climate}, :tavg; month = 1:12, res = "2.5m", lazy = true) 
ras = Rasters.combine(ser; lazy = true)

@time ras[X = 1:100, Y = 1:100, month = 1] # 1.6 seconds!!
@time ser[1][X = 1:100, Y = 1:100] # 0.06 seconds

This must be some issue with chunks and getindex.

Rasters.DA.eachchunk(ser[1]) |> size # (1, 4320)
Rasters.DA.eachchunk(ras) |> size # (1, 1, 12)

All the time is being spent on gdalopen and rasterio! and gdalclose
image

@felixcremer
Copy link
Contributor

What is tas in your example?

@tiemvanderdeure
Copy link
Contributor Author

Sorry, tas should be ser

@rafaqz
Copy link
Owner

rafaqz commented Aug 22, 2024

Ok looks like some problem like reading every single cell separately.... it should just be one chunk looking at you eachchunk output. So the ConcatDiskArray isn't doing its job properly.

It might be fixed on DiskArrays main but we aren't using it yet because of NCDatasets

@tiemvanderdeure
Copy link
Contributor Author

This is on DiskArrays v0.4.4. I don't have NCDatasets in the environment

@rafaqz
Copy link
Owner

rafaqz commented Aug 22, 2024

Ok... so the problem could be that we need to special-case ConCatDiskArray and just open the files that it needs to read from? They wont be found by Flatten.flatten because theyre objects in an array.

I had assumed readblock would do that fine because each block is a whole file and it would only open once... but something is not working and we are reading into a FileArray many times instead of opening it once.

@lazarusA lazarusA added the enhancement New feature or request label Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants