-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bnb/dh refactor #220
Open
bnb32
wants to merge
383
commits into
main
Choose a base branch
from
bnb/dh_refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Bnb/dh refactor #220
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bnb32
force-pushed
the
bnb/dh_refactor
branch
10 times, most recently
from
June 27, 2024 17:34
ebb154c
to
bfe2f9f
Compare
bnb32
force-pushed
the
bnb/dh_refactor
branch
4 times, most recently
from
July 1, 2024 15:58
53d1c66
to
bbc4af1
Compare
bnb32
force-pushed
the
bnb/dh_refactor
branch
4 times, most recently
from
July 19, 2024 20:07
59b9817
to
a546b27
Compare
…pdates to dc exo training tests
…tory meta class. moved bias correct from ForwardPass, done for each chunk, to ForwardPassStrategy init.
…xoData.split` method.
…ed to start at the zeroth hour, if it has not been shifted already.
…parate script if this is an issue.
…t features. now just repeat over time dimension in the forward pass padding method. this is much more performant for time independent features. this doesnt solve slow down issues with time dependent exo features like sza though.
… deriver. typo in solar module.
…g. u_30m from u_10m and u_100m, with u pressure level array
… Added shape checks in ``test_access.py``
…thout pre loading of data
…dataset() which will default to a dask array manager if chunks is specified and load into memory as numpy arrays in chunks is None.
Bnb/compute on none chunks
Gb/bc debug
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ok, here we go...
sup3r/preprocessing
was previously just data handlers and batch handlers, essentially. Now we haveLoaders
,Extracters
,Derivers
,Cachers
which are composed insup3r.preprocessing.data_handlers.factory
to build objects similar to the oldDataHandlers
. These do basically everything the old handlers used to do, except for training / batching related routines like sampling, normalization, etc.Loaders
just load netcdf / h5 data into axr.Dataset
- like container.Extracters
extract spatiotemporal regions of data.Derivers
derive new features from raw feature data.Cachers
, well, they cache data to either h5 or netcdf depending on the extension of the output file provided.In
sup3r/preprocessing
we additionally haveSamplers
andBatchQueues
. These are composed insup3r.preprocessing.batch_handlers.factory
to build objects similar to the oldBatchHandlers
. These do basically everything that the old batch handlers used to do, with some exceptions. The most notable exception is probably that they don't split data into training and validation sets.BatchHandler
objects will take "collections" of data handler like objects (these can beDataHandlers
,Extracters
,Derivers
, etc) for both training and validation and separate batch queues will be used for each.Samplers
simply contain axr.Dataset
- like object and sample that data as an iterator.BatchQueue
objects interface with samplers to keep a queue full of batches / samples while models are training.All these smaller objects like
loaders
,extracters
,derivers
,samplers
are built on top of xr.Dataset - like objects (sup3r.preprocessing.accessor.Sup3rX
andsup3r.preprocessing.base.Sup3rDataset
) which serve as the familiar.data
attribute for data and batch handlers.Sup3rDataset
is wrapped aroundSup3rX
to provide an interface for "dual" dataset objects contained by dual handlers and acts exactly likeSup3rX
when datasets are not dual.Sup3rX
is anxr.Dataset
"accessor" class, which is the recommended way to extendxr.Datasets
(as opposed to subclassing). TheseSup3rX
/Sup3rDataset
objects act similar toxr.Datasets
but with extended functionality. The tests intests/data_wrappers/
show how to interact with these objects.Since the fundamental dataset objects are now
xr.Dataset
- like, they can use dask arrays to store data. This means we don't need to load data into memory until we need the result of a computation.ForwardPassStrategy
andForwardPass
have been updated accordingly, since we can lazy load the full input dataset and then index the data handler.data
attribute to select generator input chunks, all before loading into memory.BatchHandler
objects have amode
argument which can be set to eitherlazy
(load batches into memory only when they are sent out for training) oreager
(load.data
into memory upon handler initialization).Tests have been added for all new preprocessing modules and lots of documentation / notes have been added throughout. Tests should hopefully provide good examples of use patterns for these new objects.