You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear developers:
I want to calculate the R95P based on ERA5 data from 1950 to 2023. The function create_ensemble or open_mfdataset were used to load the dataset. The functions ensemble_percentiles or percentile_doy were used to calculate the percentile of the day of the year. Then, according to the function xclim.indicators.icclim.R95p, the R95P we got. However, the computer memory exploded when the R95P results were exported. On the other hand, i'm confused about the difference between the function quantile and the function percentile_doy.
Computer memory explodes when proceeding at this point (results_R95P .to_netcdf('../R95P.nc', format='NETCDF4', engine='netcdf4'))
What I Did
I initially guessed that the computer memory was too small, so I loaded the data for each grid into the computer memory and finally concat all grids, but this way made the calculation too time-consuming. Do you have a better way?
Simple example for my solution
self.Rows=self.Files[self.Variable].shape[1] # todo Longitude or latitude
for i in range(self.Rows):
RD_N_Y_data = []
data = self.Files.tp[:, i, :].load()
File_Block = xclim.core.units.amount2rate(data, out_units="mm/d")
wetdays_Array = File_Block.where(File_Block >= 1)
RNT = wetdays_Array.quantile([0.8], dim='time', keep_attrs=True)
RD_N_Y = xclim.indicators.icclim.R95p(File_Block, pr_per=RNT, freq='YS')
RD_N_Y_data.append(RD_N_Y)
xr.concat(RD_N_Y_data , dim='latitude').to_netcdf('../R95P.nc', format='NETCDF4', engine='netcdf4')
Thank you very much for your help, and I look forward to your reply!
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
On the other hand, i'm confused about the difference between the function quantile and the function percentile_doy.
percentile_doy computes percentiles for each day of the years, so you get 365 values per grid cells.
The result of percentile_doy may be used as threshold to compute, for example, the number of days where the doy threshold is reach, this is typically what an climate index like TX90p expect.
On the other hand, quantile, when computed on the time axis, will compute 1 value per grid cells. These values can then be used as threshold to compute the exceedance in indices such as R95p.
First, if you compute period percentiles instead of doy percentiles, you may not have any performance issue because computing doy percentiles requires much more operations than for period percentiles. Then, if you still have perf issues read the following.
xclim relies on dask to handle cases where datasets do not fit in memory.
Basically dask does what you attempted by chunking the dataset into small parts that fits in memory.
You may not have notice this, but you are already using dask via xarray's open_mfdataset(...) function.
So the chunking should already take place and it means you need to dive deeper into dask to be able to run this computation on your machine.
I suggest that you try the distributed scheduler of dask, it gives much more control over the memory management of the computation. Have a look at the quickstart here: https://distributed.dask.org/en/latest/quickstart.html
In short you first need to install it with pip or conda, like pip install dask distributed
Then you can setup the Localcluster of dask with:
from distributed import Client
client = Client(memory_limit="20GB", n_workers=1, threads_per_worker=16)
(adapt mem and threads to your machine).
Note that on a laptop, I would recommend to stick with a single worker and adding threads to minimize the per process communication.
And then you can run your computation in the same python process (typically the same notebook).
Dask will trigger computation either when calling .compute on the resulting xarray object or when you run to_netCDF.
You can even monitor the computation on your browser by reaching http://localhost:8787/status (port by default) after the client object has been created.
Generic Issue
Description
Dear developers:
I want to calculate the R95P based on ERA5 data from 1950 to 2023. The function
create_ensemble
oropen_mfdataset
were used to load the dataset. The functionsensemble_percentiles
orpercentile_doy
were used to calculate the percentile of the day of the year. Then, according to the functionxclim.indicators.icclim.R95p
, the R95P we got. However, the computer memory exploded when the R95P results were exported. On the other hand, i'm confused about the difference between the functionquantile
and the functionpercentile_doy
.Code
Computer memory explodes when proceeding at this point (
results_R95P .to_netcdf('../R95P.nc', format='NETCDF4', engine='netcdf4')
)What I Did
I initially guessed that the computer memory was too small, so I loaded the data for each grid into the computer memory and finally concat all grids, but this way made the calculation too time-consuming. Do you have a better way?
Simple example for my solution
Thank you very much for your help, and I look forward to your reply!
Code of Conduct
The text was updated successfully, but these errors were encountered: