Monday, August 7th, 2023 (10 months ago)
TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment (almost) fully customizable, via built-in and/or 3rd party indexes. It also addresses a good amount of long-standing issues with "dimension coordinates" implicitly backed by pandas (multi-)indexes.
[link to Joe's CZI blog post]
Some datasets could not be loaded with Xarray (dimension name and coordinate with same name but different dimensions)
Complicated workarounds (swap_dims, etc.)
Limited and/or challenging for data cubes representing arbitrary grids (curvilinear grids, unstructured meshes, etc.).
Refactor index logic in Index
classes. More easily maintainable. May help Pandas become optional dependency in the future? (cf. Xarray-lite).
Also allowed to solve lots of issues with multi-indexes, for which each level has now its own real coordinate.
Dataset / DataArray section has now an "indexes" section.
Set an index for non-dimension coordinates! (No more swap_dims anymore or coordinate renaming)
1ds.set_xindex(“non_dim_coord”).sel(non_dim_coord=“something”) 2
E.g., Numpy index (much faster to build, much more expensive to query), Geometry index (xvec)
Out-of-core index, etc.
...or no index at all! (Create dataset with no default index, drop_indexes
)
Not limited to 1-dimensional coordinates, even more flexible!
RasterIndex, FunctionalIndex, etc.
See xarray discussion for examples
Still unfinished [link: indexes next steps GH issue], extension entry points, etc.
CZI, Xarray core developers, etc.