Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-tile-output #289

Closed
wants to merge 116 commits into from
Closed

Multi-tile-output #289

wants to merge 116 commits into from

Conversation

fraserwg
Copy link
Contributor

This pull request should enable the opening of tiled MITgcm MDS datasets (see issue #28). The implementation does not use zarr as suggested in the comments of this issue, but should enable users to open tiled datasets, using a dask backend, in a similar manner to how untiled datasets are.

rabernat and others added 30 commits November 8, 2017 10:01
* test refactor

* added tests for MITgcm#59

* fixes MITgcm#59

* added calendar

* caught a little bug
* added reading functionality of PTRtave files

* changed trname to travename to avoid duplication

* Added the units back as [kg m-3]
* Extract version number from git tag

This keeps the version number in the documentation consistent with the code release. Code taken from MITgcm/MITgcm@96334be Closes MITgcm#71

* Fixing style errors.

* Move import statement to conform with code linter

* Make mds_store test work with xarray >=0.10.1

As noted in MITgcm#72 an upstream change in xarray broke one of the tests. This commit also modifies dependencies in setup.py to xarray >= 0.10.1
…gcm#77)

* Generalizes _get_all_iternums in order to handle compressed data

* fix excessive line length

* Fixing style errors.
* cast `prefix` to list if it isn't already one

* Check for unicode strings too
* fix python2/3 compatibilty issue for string test

* no need for extra line

* Fixing style errors.
* extend capabilities of read_raw_data

* possibility to read part of the file, with offset and partial_read
* choice of row/column major order

* testing + better error handling

* function will warn user if trying to pass inconsistent args
* function checks byte offset < file size

* Fixing style errors.

* get it shorter

* completing tests for codecov

* Fixing style errors.

* fix line length

* fix line length

* try to improve coverage

* finalize request

* remove blank lines
* iteration on dtype cleaner

* replace assert with raise error

* try to fool code cov

* fix indentation

* codecov didn't bite the bait

* fix error in testing
* refactoring of low level functions

* Fixing style errors.

* run flake8

* Fixing style errors.

* unit test

* fix style

* changes requested pt1

* improvements on read_xy_chunk

* padding in separate function
* return numpy array or memmap

* Fixing style errors.

* fix line length

* add more testing

* stupid bug

* more testing

* Fixing style errors.

* test for read_generic_data

* Fixing style errors.

* improve coverage

* Fixing style errors.

* unit test for read_all_variables

* Fixing style errors.

* corrections requested

* eager mode for read_generic_data

* read_3d_chunk fct

* testing for read_3d_chunk

* Fixing style errors.

* read big chunks + testing

* missing conflict

* Fixing style errors.

* Refactor read_all + test

* Fixing style errors.

* better docstrings

* working on tests, pb memmap

* Fixing style errors.

* revisions

* read_3d_chunk -> read_xyz_chunk
* correction docstring

* Fixing style errors.

* correct name function in test

* more docstrings

* one case of memmap evaluation figured out

+ wrong test in read_xyz_chunk

* Fixing style errors.

* not strickly enforce memmap test

* Fixing style errors.
Update documentation with conda instructions
* add native output description for thsice and seaice

* maintaining the tradition of not getting it right the first time

* try to make standard_names more or less CF compliant

* Fixing style errors.

* Fixing style errors.

* Fixing style errors.

* Fixing style errors.

* Fixing style errors.

* Fixing style errors.

* Fixing style errors.

* Fixing style errors.

* insert line breaks in four long lines
* clean refactored read_mds

* test extra_metadata

* Fixing style errors.

* corrections requested

* Fixing style errors.

* fix strickler

* replaced dask_delayed by use_dask

* fix cosmetic errors

* Fixing style errors.

* fix cosmetic errors

* get extra metadata

* Fixing style errors.

* fix assert

* use get_extra_metadata in read_mds

* refactor testing, small/big to 2D/3D

* Fixing style errors.
…ITgcm#97)

* clean PR

* clean up

* Fixing style errors.

* add test for extra_metadata

* Fixing style errors.

* fix line length

* fix extra_metadata variables

* Fixing style errors.

* fix line length
* wip: almost done

* Fixing style errors.

* fix bug

* Fixing style errors.

* fix test
* added angles to llc grid coords

* Fixing style errors.
* correcting a bug, fixes MITgcm#102

* more comments

* testing for bad nx,ny combination

* Fixing style errors.

* cosmetics

* Fixing style errors.
* extra variables from MITgcm/model/src/write_grid.F

* to do: add sigma coordinate variables
* only grabs variables when they exist

* look for variables in data_dir not grid_dir
timothyas and others added 13 commits April 14, 2021 10:21
Minor update xarray master CI workflow
* modified open_mdsdataset to add extra_variables

* Backwards compatible

* modified open_mdsdataset to add extra_variables

* Backwards compatible

* add tests and extend docstring

* only copy file if input is available.

Co-authored-by: Spencer Jones <[email protected]>
Co-authored-by: Aaron Schneider <[email protected]>
* fix combine_by_coords

* change to new only for versions newer than previous xarray version

* fix tests
@fraserwg
Copy link
Contributor Author

I think this is failing because the tests can't find the new tiled_gyre.tar.gz file I've added in the xmitgcm/test folder, but I'm not too sure how to fix this.

@rabernat
Copy link
Member

@fraserwg - thanks so much for getting this started! 🎉

I think this is failing because the tests can't find the new tiled_gyre.tar.gz file I've added in the xmitgcm/test folde

We should definitely not be keeping large binary files in the github repo. The test data currently live in figshare: https://figshare.com/collections/xmitgcm_test_data/4362224

I have downloaded the file and will upload it there. In the meantime, you should remove the file from this PR. Note that itn't enough to to git rm, since the file will stay in the git history. You need to actually remove it from the history. Sorry that this was not made clear earlier.

@fraserwg fraserwg marked this pull request as draft November 23, 2021 16:21
@fraserwg fraserwg marked this pull request as ready for review November 23, 2021 16:28
@fraserwg fraserwg marked this pull request as draft November 23, 2021 16:32
@fraserwg
Copy link
Contributor Author

I tried to remove the file from the history but now it looks like every single commit ever made to the repository is listed in this pull request. Apologies if this has broken anything!

@rabernat
Copy link
Member

Hi @fraserwg -- I'm really sorry that your first PR to xmitgcm involves such a git mess. I know this must feel very frustrating.

Here is my recommendation for how to resolve it:

  • Copy all of the code files that you have actually changed to implement this new feature into a temporary directory
  • Check out a fresh, clean copy of the repo
  • Cop the files from the temporary directory to the fresh clean repo
  • Add and commit those files (do not add the .gz file)
  • Close this PR and open a new one from that clean copy

I thank your contribution and your patience. I will post the obligatory xkcd cartoon about git.

xkcd cartoon

@fraserwg
Copy link
Contributor Author

Hahah -- it's no problem. Another pull request should be incoming shortly.

@fraserwg fraserwg closed this Nov 23, 2021
@fraserwg fraserwg deleted the approach3 branch July 13, 2022 14:59
@fraserwg fraserwg restored the approach3 branch July 13, 2022 14:59
@AaronDavidSchneider
Copy link
Contributor

AaronDavidSchneider commented Oct 17, 2022

Hi @fraserwg and @rabernat, I followed the discussion here and in #28 and am now curious if people continued to work on the issue? I could not find any PR that fixes the git related issues in this PR.

@fraserwg
Copy link
Contributor Author

From what I remember (having not touched this for a while) this branch, although not merged, should read the tiled datasets in ok. You should be able to call xmitgcm.open_mdsdataset as usual, but with the keyword option tiled=True if you install xmitgcm using this branch.

@fraserwg
Copy link
Contributor Author

Actually, I think PR #290 is the one you want to install from, as that doesn't have the "git mess" mentioned earlier in this thread. Tiled datasets are read in in the same way though.

@AaronDavidSchneider
Copy link
Contributor

AaronDavidSchneider commented Oct 18, 2022

Thanks for the message. Is PR #290 ready to go? Is it just the tests that need to be fixed? If it's already close to being ready, I could pick it up.

@fraserwg
Copy link
Contributor Author

I think it is ready to go, but the tiled test data isn't in the test data store so it won't pass all tests atm. I'm also not too sure how well the code scales (for larger tiled domains) so I'd suggest trying it and seeing if it works for you. If it's slow you may want to fork the branch and have a go at refactoring things for performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.