-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow opening selected groups only #338
base: main
Are you sure you want to change the base?
Allow opening selected groups only #338
Conversation
This takes advantage of replacing the generator of paths in the _open_datatree_* functions
There seems to be failing tests that I don't think is our doing, as we could reproduct them on the main branch (before our changes where added), is that to be expected? |
before you spend more time here: could you check if the version that was integrated into Edit: but yes, the failing tests seem unrelated, that's because of a change in the |
@keewis thanks for the heads up. We need to read batches of 80 files, which have around 70 groups each, on my laptop that takes now around 2 second per file, so almost three minutes to generate the datatrees. As this is for a process that needs to run in realtime, with a new batch every 10 minutes, we are looking for all the performance gains we can get. |
okay, sure. I'd still recommend checking the version in |
From what I understand, the |
Hi @mraspaud - thanks for this contribution! I can see how this might be useful. I apologise for the indeterminate state of datatree right now.
This repository will soon be archived, so if you want this feature then your PR here will need to be reconciled with what's now in xarray The recent PR's that @keewis mentioned are especially pertinent - they speed up opening We should think about whether your use of the Another idea you might want to think about is whether the suggested |
This PR allows opening selected groups only in
open_datatree
.The use case is speeding up loading of files with many groups, in our case netcdf, where we actually need a handful of groups to be loaded.
pre-commit run --all-files
docs/source/whats-new.rst