Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interface for rewrapping Array wrappers #136

Open
rafaqz opened this issue Apr 2, 2021 · 11 comments
Open

interface for rewrapping Array wrappers #136

rafaqz opened this issue Apr 2, 2021 · 11 comments

Comments

@rafaqz
Copy link
Contributor

rafaqz commented Apr 2, 2021

Some wrapper types like AxisArray, NamedDImsArray and DimArray (and probably LabelledArray?) need to be the "outside" wrapper for, e.g. keeping the axis index accurate and dispatch for getindex methods.

This issue just came up on discourse:
https://discourse.julialang.org/t/multiple-inheritance-carry-over-members/58355/8

People kind of expect a DimArray to keep working after wrapping with other types. It could be useful to have a trait and a shared method here that allows rewrapping:

rewrap(f, A) and rewrap(f, A, I...) could be used in other array constructors to allow rewrapping where necessary.

isouterwrapper could indicate if an object is an outer wrapper type.

@rafaqz rafaqz changed the title Array wrappers rewrapping Array wrappers Apr 2, 2021
@rafaqz rafaqz changed the title rewrapping Array wrappers interface for rewrapping Array wrappers Apr 2, 2021
@Tokazama
Copy link
Member

Tokazama commented Apr 3, 2021

I think this would be a band-aid for the real issue, which is that these arrays are all adding some sort of new trait/behavior but the trait is only evident if it is the top level. Similarly, we shouldn't try to make a StaticArray continually wrap other stuff to keep its size information statically known to other function.

Of course, the situation is different if the wrapper actually changes the layout of an array. For example, Diagonal or UpperTriangular.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 3, 2021

This is partially true, but in practice it will be very difficult for axis indexing to work correctly through multiple outer wrappers. Axis lookup wrappers like DimArray usually leave the wrapped object as-is, but add axis keys that match it's apparent axes - this is hard to deal with as an internal wrapper, where the outer axes are unkown, and easy to deal with as an external wrapper, where they are known. In this case most methods are just forwarded to the parent array except the axis indexing part.

Being wrapped by other kind of arrays is awkward in comparison. How can a DimArray handle being wrapped by a SubArray? what happens with a lookup object like X(Near(103.6)) (given that it is passed through at all) that returns an index out of bounds for the outer SubArray but not for the inner DimArray? it probably shouldn't be able to index into data outside of the SubArray. But I'm not sure how to do that in a way that scales to multiple wrappers.

In practice it is much easier to aggressively rewrap to the outside - e.g. look at broadcast for NamedDims/DimensionalData/AxisKeys.

I may be mistaken but maybe @mcabbott has experimented with these alternate strategies somewhere? From my own experience with DimensionalData I think that wrappers that do not affect the objects size, shape or values, but do depend on them, always being on the outside is the simplest option. There is only the caveat of needing some rule for working out what happens when two of these types collides (ie. unwrap DimArray if wrapped with a NamdDimArray - we don't need both).

I would of course be persuaded by a clean working solution that didn't need rewrapping to the outside! but I haven't seen one yet.

Edit: also, for transposes and similar, both ways have their issues, there are definitely some situations where being the outer wrapper is not easier. Traits that notify or changes in axis order and permutation could help here.

@rafaqz rafaqz closed this as completed Apr 3, 2021
@rafaqz rafaqz reopened this Apr 3, 2021
@mcabbott
Copy link

mcabbott commented Apr 3, 2021

Yes I experimented a bit. In fact AxisKeys.jl is one experiment really, at making its wrapper commute with that of NamedDims.jl -- it runs the tests in both orders. But the approach is just to write lots of methods, which is not going to scale well, let alone work for packages which don't know about each other.

It seems hard to know quite what re-wrapping should achieve in general. Broadcasting is easy in that Base defines the scope, likewise views. But making something like padding (from the discourse thread) "work" with array wrappers not designed for it seems messy. Regardless of which ends up on top, what are the rules? When an AxisArray has a StepRange do you extend it? And categorical? What happens to Hermitian or BiDiagonal types? By the time you've answered these, then it seems hard to say the wrapper and the padding don't know about each other anymore. If the shared knowledge lives in some traits package, it seems that this must grow to basically contain all the logic.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 3, 2021

Yes there would need to be fairly specific scope. Outside of Base, things like CuArray and DistributedArray (or is is DArray?) are a good example of an array type that should be rewrapped, as they mostly just change the location of the data.

Then, something like a PaddedView changes the array size, which will break the index, but rewrap would still be useful just to keep dimension names - it makes no difference to NamedDims.jl that the size has changed (and yes I have though about extending axis ranges a few times...)

Further in,Transpose etc reorganise the dimension order, which breaks everything unless you handle it. Things are starting to get hard here, but there may be more elegant ways of dealing with transposition generally. Currently we all just add methods manually to handle the axis reordering.

Then there is every other way that shape/size can be altered by a wrapper, that we just can't handle, and shouldn't rewrap.

It seems there is some structure to the situation that we could work with, if it's worth it. We have 1 trait for an array wrapper: resizes_axes. If the axes are permuted, permutes_axes. And if some are added/removed or broken in other ways changes_axes. etc.

@Tokazama
Copy link
Member

Tokazama commented Apr 3, 2021

I have nearly complete solutions worked for padding/resizing/permuting/etc. The remaining issue has really been arrays with different memory backends, which is the point of #130 and #140.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 3, 2021

Ok I'm keen to see what you come up with! Although I'm still concerned that a lot of edge cases will mean we still need dispatch on AbstractDimArray for DimensionalData.jl and similar packages. which requires being the outer wrapper. There are a bunch of changes to Base Julia needed to change that.

@Tokazama
Copy link
Member

Tokazama commented Apr 3, 2021

There are a bunch of changes to Base Julia needed to change that.

There are some methods that will have issues and we need to put together our own implementation, but that's manageable. Most of these are actually pretty straightforward. "src/indexing.jl" has a lot of that for getindex. There are definitely a few things that need to happen before we can make big strides on some of this. Right now I'm working on abstractions for layouts that incorporate a lot of the stuff that's been done to support LoopVectorization.jl

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 3, 2021

I like your enthusiasm!! but these wrappers do a lot more than getindex... dimension names are used in all the base functions that have a dims argument.

  • Can those also be handled as an inner wrapper?
  • Will an adjacent axis index be concatenated during cat for an inner wrapper?
  • What about Base._mapreduce_dim reducing the wrapper dims?
  • ...

It would be great but it sounds like a lot of work to get all of this done. Absolutely we should aim high, and in the end a lot of that base code does needs to be fixed so these things are easier.

My proposal here is just a basic incremental improvement on what already exists, but it's also pretty easily achieved.

@Tokazama
Copy link
Member

Tokazama commented Apr 3, 2021

I should probably write something like "The State of ArrayInterface" thing to answer this properly. I've been doing a pretty terrible job of making stuff known and approachable for the Julia community over the last year. I'm going to finish a PR and start working on that.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 3, 2021

That would be great, good to start getting more people on board and contributing, at least ideas and feedback.

@Tokazama
Copy link
Member

BTW #141 is the PR I was referring to, so hopefully I can get started on this soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants