Skip to content

Commit

Permalink
Merge branch 'main' into pre-commit-ci-update-config
Browse files Browse the repository at this point in the history
* main:
  add backend intro and how-to diagram (#9175)
  Fix copybutton for multi line examples in double digit ipython cells (#9264)
  Update signature for _arrayfunction.__array__ (#9237)
  Add encode_cf_datetime benchmark (#9262)
  groupby, resample: Deprecate some positional args (#9236)
  Delete ``base`` and ``loffset`` parameters to resample (#9233)
  Update dropna docstring (#9257)
  Grouper, Resampler as public api (#8840)
  Fix mypy on main (#9252)
  fix fallback isdtype method (#9250)
  Enable pandas type checking (#9213)
  Per-variable specification of boolean parameters in open_dataset (#9218)
  test push
  Added a space to the documentation (#9247)
  Fix typing for test_plot.py (#9234)
  • Loading branch information
dcherian committed Jul 22, 2024
2 parents 25bf152 + 95e67b6 commit c836fbd
Show file tree
Hide file tree
Showing 53 changed files with 1,312 additions and 1,067 deletions.
18 changes: 18 additions & 0 deletions asv_bench/benchmarks/coding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import numpy as np

import xarray as xr

from . import parameterized


@parameterized(["calendar"], [("standard", "noleap")])
class EncodeCFDatetime:
def setup(self, calendar):
self.units = "days since 2000-01-01"
self.dtype = np.dtype("int64")
self.times = xr.date_range(
"2000", freq="D", periods=10000, calendar=calendar
).values

def time_encode_cf_datetime(self, calendar):
xr.coding.times.encode_cf_datetime(self.times, self.units, calendar, self.dtype)
4 changes: 4 additions & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -693,3 +693,7 @@

coding.times.CFTimedeltaCoder
coding.times.CFDatetimeCoder

core.groupers.Grouper
core.groupers.Resampler
core.groupers.EncodedGroups
23 changes: 19 additions & 4 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -803,6 +803,18 @@ DataArray
DataArrayGroupBy.dims
DataArrayGroupBy.groups

Grouper Objects
---------------

.. currentmodule:: xarray.core

.. autosummary::
:toctree: generated/

groupers.BinGrouper
groupers.UniqueGrouper
groupers.TimeResampler


Rolling objects
===============
Expand Down Expand Up @@ -1028,17 +1040,20 @@ DataArray
Accessors
=========

.. currentmodule:: xarray
.. currentmodule:: xarray.core

.. autosummary::
:toctree: generated/

core.accessor_dt.DatetimeAccessor
core.accessor_dt.TimedeltaAccessor
core.accessor_str.StringAccessor
accessor_dt.DatetimeAccessor
accessor_dt.TimedeltaAccessor
accessor_str.StringAccessor


Custom Indexes
==============
.. currentmodule:: xarray

.. autosummary::
:toctree: generated/

Expand Down
5 changes: 4 additions & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
}

# sphinx-copybutton configurations
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.{3,}: | {5,8}: "
copybutton_prompt_is_regexp = True

# nbsphinx configurations
Expand Down Expand Up @@ -158,6 +158,8 @@
"Variable": "~xarray.Variable",
"DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy",
"DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy",
"Grouper": "~xarray.core.groupers.Grouper",
"Resampler": "~xarray.core.groupers.Resampler",
# objects without namespace: numpy
"ndarray": "~numpy.ndarray",
"MaskedArray": "~numpy.ma.MaskedArray",
Expand All @@ -169,6 +171,7 @@
"CategoricalIndex": "~pandas.CategoricalIndex",
"TimedeltaIndex": "~pandas.TimedeltaIndex",
"DatetimeIndex": "~pandas.DatetimeIndex",
"IntervalIndex": "~pandas.IntervalIndex",
"Series": "~pandas.Series",
"DataFrame": "~pandas.DataFrame",
"Categorical": "~pandas.Categorical",
Expand Down
2 changes: 1 addition & 1 deletion doc/internals/how-to-add-new-backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ How to add a new backend
------------------------

Adding a new backend for read support to Xarray does not require
to integrate any code in Xarray; all you need to do is:
one to integrate any code in Xarray; all you need to do is:

- Create a class that inherits from Xarray :py:class:`~xarray.backends.BackendEntrypoint`
and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint`
Expand Down
93 changes: 82 additions & 11 deletions doc/user-guide/groupby.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. currentmodule:: xarray

.. _groupby:

GroupBy: Group and Bin Data
Expand All @@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
- Apply some function to each group.
- Combine your groups back into a single data object.

Group by operations work on both :py:class:`~xarray.Dataset` and
:py:class:`~xarray.DataArray` objects. Most of the examples focus on grouping by
Group by operations work on both :py:class:`Dataset` and
:py:class:`DataArray` objects. Most of the examples focus on grouping by
a single one-dimensional variable, although support for grouping
over a multi-dimensional variable has recently been implemented. Note that for
one-dimensional data, it is usually faster to rely on pandas' implementation of
the same pipeline.

.. tip::

To substantially improve the performance of GroupBy operations, particularly
with dask `install the flox package <https://flox.readthedocs.io>`_. flox
`Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
of GroupBy operations, particularly with dask. flox
`extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
by allowing grouping by multiple variables, and lazy grouping by dask arrays.
If installed, Xarray will automatically use flox by default.

Split
~~~~~
Expand Down Expand Up @@ -87,7 +90,7 @@ Binning
Sometimes you don't want to use all the unique values to determine the groups
but instead want to "bin" the data into coarser groups. You could always create
a customized coordinate, but xarray facilitates this via the
:py:meth:`~xarray.Dataset.groupby_bins` method.
:py:meth:`Dataset.groupby_bins` method.

.. ipython:: python
Expand All @@ -110,7 +113,7 @@ Apply
~~~~~

To apply a function to each group, you can use the flexible
:py:meth:`~xarray.core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
concatenated back together along the group axis:

.. ipython:: python
Expand All @@ -121,8 +124,8 @@ concatenated back together along the group axis:
arr.groupby("letters").map(standardize)
GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
aggregation function:

.. ipython:: python
Expand Down Expand Up @@ -183,7 +186,7 @@ Iterating and Squeezing
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
a GroupBy object. This behaviour is being removed.
You can always squeeze explicitly later with the Dataset or DataArray
:py:meth:`~xarray.DataArray.squeeze` methods.
:py:meth:`DataArray.squeeze` methods.

.. ipython:: python
Expand Down Expand Up @@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)
Because multidimensional groups have the ability to generate a very large
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
may be desirable:

.. ipython:: python
Expand All @@ -232,3 +235,71 @@ applying your function, and then unstacking the result:
stacked = da.stack(gridcell=["ny", "nx"])
stacked.groupby("gridcell").sum(...).unstack("gridcell")
.. _groupby.groupers:

Grouper Objects
~~~~~~~~~~~~~~~

Both ``groupby_bins`` and ``resample`` are specializations of the core ``groupby`` operation for binning,
and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
variables with a combination of categorical grouping, binning, and resampling; or more specializations like
spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
extension point using ``Grouper`` objects.

.. tip::

See the `grouper design`_ doc for more detail on the motivation and design ideas behind
Grouper objects.

.. _grouper design: https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md

For now Xarray provides three specialized Grouper objects:

1. :py:class:`groupers.UniqueGrouper` for categorical grouping
2. :py:class:`groupers.BinGrouper` for binned grouping
3. :py:class:`groupers.TimeResampler` for resampling along a datetime coordinate

These provide functionality identical to the existing ``groupby``, ``groupby_bins``, and ``resample`` methods.
That is,

.. code-block:: python
ds.groupby("x")
is identical to

.. code-block:: python
from xarray.groupers import UniqueGrouper
ds.groupby(x=UniqueGrouper())
; and

.. code-block:: python
ds.groupby_bins("x", bins=bins)
is identical to

.. code-block:: python
from xarray.groupers import BinGrouper
ds.groupby(x=BinGrouper(bins))
and

.. code-block:: python
ds.resample(time="ME")
is identical to

.. code-block:: python
from xarray.groupers import TimeResampler
ds.resample(time=TimeResampler("ME"))
75 changes: 75 additions & 0 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,81 @@ format (recommended).
np.random.seed(123456)
You can `read different types of files <https://docs.xarray.dev/en/stable/user-guide/io.html>`_
in `xr.open_dataset` by specifying the engine to be used:

.. ipython:: python
:okexcept:
:suppress:
import xarray as xr
xr.open_dataset("my_file.grib", engine="cfgrib")
The "engine" provides a set of instructions that tells xarray how
to read the data and pack them into a `dataset` (or `dataarray`).
These instructions are stored in an underlying "backend".

Xarray comes with several backends that cover many common data formats.
Many more backends are available via external libraries, or you can `write your own <https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html>`_.
This diagram aims to help you determine - based on the format of the file you'd like to read -
which type of backend you're using and how to use it.

Text and boxes are clickable for more information.
Following the diagram is detailed information on many popular backends.
You can learn more about using and developing backends in the
`Xarray tutorial JupyterBook <https://tutorial.xarray.dev/advanced/backends/backends.html>`_.

.. mermaid::
:alt: Flowchart illustrating how to choose the right backend engine to read your data

flowchart LR
built-in-eng["""Is your data stored in one of these formats?
- netCDF4 (<code>netcdf4</code>)
- netCDF3 (<code>scipy</code>)
- Zarr (<code>zarr</code>)
- DODS/OPeNDAP (<code>pydap</code>)
- HDF5 (<code>h5netcdf</code>)
"""]

built-in("""You're in luck! Xarray bundles a backend for this format.
Open data using <code>xr.open_dataset()</code>. We recommend
always setting the engine you want to use.""")

installed-eng["""One of these formats?
- <a href='https://github.com/ecmwf/cfgrib'>GRIB (<code>cfgrib</code>)
- <a href='https://tiledb-inc.github.io/TileDB-CF-Py/documentation/index.html'>TileDB (<code>tiledb</code>)
- <a href='https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray'>GeoTIFF, JPEG-2000, ESRI-hdf (<code>rioxarray</code>, via GDAL)
- <a href='https://www.bopen.eu/xarray-sentinel-open-source-library/'>Sentinel-1 SAFE (<code>xarray-sentinel</code>)
"""]

installed("""Install the package indicated in parentheses to your
Python environment. Restart the kernel and use
<code>xr.open_dataset(files, engine='rioxarray')</code>.""")

other("""Ask around to see if someone in your data community
has created an Xarray backend for your data type.
If not, you may need to create your own or consider
exporting your data to a more common format.""")

built-in-eng -->|Yes| built-in
built-in-eng -->|No| installed-eng

installed-eng -->|Yes| installed
installed-eng -->|No| other

click built-in-eng "https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#how-do-i-open-format-x-file-as-an-xarray-dataset"
click other "https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html"

classDef quesNodefmt fill:#9DEEF4,stroke:#206C89,text-align:left
class built-in-eng,installed-eng quesNodefmt

classDef ansNodefmt fill:#FFAA05,stroke:#E37F17,text-align:left,white-space:nowrap
class built-in,installed,other ansNodefmt

linkStyle default font-size:20pt,color:#206C89


.. _io.netcdf:

netCDF
Expand Down
2 changes: 1 addition & 1 deletion doc/user-guide/terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ complete examples, please consult the relevant documentation.*
combined_ds
lazy
Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations
Lazily-evaluated operations do not load data into memory until necessary. Instead of doing calculations
right away, xarray lets you plan what calculations you want to do, like finding the
average temperature in a dataset.This planning is called "lazy evaluation." Later, when
you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"
Expand Down
23 changes: 21 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ v2024.06.1 (unreleased)

New Features
~~~~~~~~~~~~
- Introduce new :py:class:`groupers.UniqueGrouper`, :py:class:`groupers.BinGrouper`, and
:py:class:`groupers.TimeResampler` objects as a step towards supporting grouping by
multiple variables. See the `docs <groupby.groupers_>` and the
`grouper design doc <https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md>`_ for more.
(:issue:`6610`, :pull:`8840`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Allow per-variable specification of ``mask_and_scale``, ``decode_times``, ``decode_timedelta``
``use_cftime`` and ``concat_characters`` params in :py:func:`~xarray.open_dataset` (:pull:`9218`).
By `Mathijs Verhaegh <https://github.com/Ostheer>`_.
- Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`).
By `Martin Raspaud <https://github.com/mraspaud>`_.
- Extract the source url from fsspec objects (:issue:`9142`, :pull:`8923`).
Expand All @@ -33,6 +42,11 @@ New Features

Breaking changes
~~~~~~~~~~~~~~~~
- The ``base`` and ``loffset`` parameters to :py:meth:`Dataset.resample` and :py:meth:`DataArray.resample`
is now removed. These parameters has been deprecated since v2023.03.0. Using the
``origin`` or ``offset`` parameters is recommended as a replacement for using
the ``base`` parameter and using time offset arithmetic is recommended as a
replacement for using the ``loffset`` parameter.


Deprecations
Expand Down Expand Up @@ -70,15 +84,20 @@ Bug fixes
Documentation
~~~~~~~~~~~~~

- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_).
- Adds intro to backend section of docs, including a flow-chart to navigate types of backends (:pull:`9175`).
By `Jessica Scheick <https://github.com/jessicas11>`_.
- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_, :pull:`9147`).
By `Jessica Scheick <https://github.com/jessicas11>`_.
- Improvements to Zarr & chunking docs (:pull:`9139`, :pull:`9140`, :pull:`9132`)
By `Maximilian Roos <https://github.com/max-sixty>`_.

- Fix copybutton for multi line examples and double digit ipython cell numbers (:pull:`9264`).
By `Moritz Schreiber <https://github.com/mosc9575>`_.

Internal Changes
~~~~~~~~~~~~~~~~

- Enable typing checks of pandas (:pull:`9213`).
By `Michael Niklas <https://github.com/headtr1ck>`_.

.. _whats-new.2024.06.0:

Expand Down
Loading

0 comments on commit c836fbd

Please sign in to comment.