You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed a very significant performance hit when running missing_wmo with and without dask. The dask version seems to slow things down, and I had to load the array to get results in a reasonable amount of time.
Steps To Reproduce
No response
Additional context
No response
Contribution
I would be willing/able to open a Pull Request to address this bug.
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
importxclimasxcfromxclim.testingimportopen_dataset# Open a dataset of a single chunkds=open_dataset('sdba/CanESM2_1950-2100.nc', chunks={'time': -1, 'location': -1})
pr_valid=xc.core.missing.missing_wmo(ds.pr, freq="YS")
The last line took me 115 s. But most importantly, counting the number of tasks with len(ds.pr.__dask_graph__().keys()), I see an increase from 6 to 95304. This is insane!
A probable solution would be to wrap as much as possible into single apply_ufunc or map_blocks call to aggregate the tasks. Maybe we can look into flox for help since we are applying a function along the time axis and grouping.
Setup Information
Description
I noticed a very significant performance hit when running
missing_wmo
with and without dask. The dask version seems to slow things down, and I had toload
the array to get results in a reasonable amount of time.Steps To Reproduce
No response
Additional context
No response
Contribution
Code of Conduct
The text was updated successfully, but these errors were encountered: