Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: MemoryMarshal.GetArrayDataReference<T>(T[,]) overload #35528

Closed
Sergio0694 opened this issue Apr 27, 2020 · 23 comments · Fixed by #53562
Closed

Proposal: MemoryMarshal.GetArrayDataReference<T>(T[,]) overload #35528

Sergio0694 opened this issue Apr 27, 2020 · 23 comments · Fixed by #53562
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Apr 27, 2020

Edit by @GrabYourPitchforks: See alternative API proposal at #35528 (comment).


Overview

This issue is about adding a new API to System.Runtime.InteropServices.MemoryMarshal:

namespace System.Runtime.InteropServices
{
    public static partial class MemoryMarshal
    {
        public static ref T GetArrayDataReference<T>(T[,] array);
    }
}

This would just be an overload of the existing MemoryMarshal.GetArrayDataReference<T>(T[]) API (introduced here). Additional overloads for 3/4/4D arrays could be considered as well.

Motivation/behavior

Same as GetArrayDataReference<T>(T[]), just extended to more array types. As an additional note, this API would ignore custom lower bounds, if present, and always just return a reference to the first array element, if the array is not empty, no matter its expected index. This API is expected to offer the same behavior as the overload for SZ arrays, and if custom lower bounds are present it's up to individual developers to track them correctly after retrieving the reference.

This API would also be particularly useful considering the fact that the standard index accessor for ND arrays is much slower than the one for SZ arrays (related issue on this point: #35293).

cc. @tannergooding

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels Apr 27, 2020
@MihaZupan MihaZupan added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 27, 2020
@Sergio0694
Copy link
Contributor Author

On a side note, @GrabYourPitchforks I have a few questions for you about the changes you made in #1036, I'd like to get a decent understanding of what's going on there but I'm a bit confused 🤔
In particular, I was wondering:

  1. I see that RuntimeHelpers has a GetRawArrayData API already, although it's not specific for SZ arrays so it has the unnecessary base size lookup even when the actual array rank is known. Is this why the RawArrayData mapping class was introduced? Wouldn't the JIT be able to optimize that on its own anyway, when seeing it used from eg. an SZ array directly?
  2. More general question: it was my understanding that the primary purpose of intrinsics was to enable platform-specific behavior, eg. mapping calls to Vector<T> to the best possible SIMD instructions on the current CPU. Why is it that a method like GetArrayDataReference is implemented through intrinsics as well despite it only relying on the pointer size (which is baked in through a precompiler directive) to determine the right offset to load? In other words, why isn't the base implementation just using RawArrayData.Data + cast, with no need for intrinsics?
  3. Looking at the actual intrinsics emit (in IL.Stubs and jitinterface.cpp), isn't the code essentially doing what the compiler would've already produced on its own just by doing ref Unsafe.As<RawArrayData>(array).Data + cast? I'm particularly confused about this step I think.
  4. Suppose this new overload for T[,] arrays was introduced, would there also need to be a mapping class for 2D arrays (with the additional HxW and lower bounds fields to skip), or could it be implemented by calling RuntimeHelpers.RawArrayData, casting the returned reference and letting the JIT do its thing? And if a dedicated mapping class (eg. RawArray2DData) was needed, like with the T[] overload, would this new API need intrinsics too?

Just to be extremely clear, I'm not in any way trying to criticize the design/implementation there, I'd just love to understand why the code is structured this way in particular and to learn something new. 😄
Thanks!

@GrabYourPitchforks
Copy link
Member

Good questions!

  1. The only remaining runtime calls to the internal GetRawArrayData method should be when we don't statically know that the incoming array is an SZ array. Currently only 9 references remain to GetRawArrayData. Almost all of those call sites have been migrated to the new API. So the only remaining call sites should be for things that the JIT wouldn't have been able to optimize anyway.

  2. We've overloaded the term "intriniscs" somewhat. If you're talking the System.Runtime.Intrinsics namespace, then yeah, we're using "intrinsics" to refer to particular assembly instructions. But within System.Runtime.CompilerServices (and the Unsafe class as a good use case), we mean "JIT intrinsics". Generally speaking these are situations where we need to write a method using very specific IL patterns which are already supported by the runtime but which aren't expressible in C#. The implementation of GetArrayDataReference falls into this second category. Ideally the implementation would've been written as ldarg.0; ldc.i4.0; no.(typecheck|rangecheck).ldelema; ret;. But since the runtime currently doesn't honor the no. prefix (see The "no." opcode prefix is not implemented #10112), this is the best we could do for now.

  3. Unsafe.As<T> is a generic method. There are some outstanding JIT issues which prevent generic methods from being inlined properly in R2R scenarios. The current best way to work around this is to use JIT intrinsics as mentioned above rather than calling the generic method directly. Hence GetArrayDataReference is implemented in raw IL rather than bouncing through Unsafe.As<T>.

  4. I would think that it would make more sense to have a single Array overload that can handle any rank, any bounds. Otherwise we'd have a large growth of methods which deal with rank-2 arrays, rank-3 arrays, and so on. The actual implementation of this method would still be different from the SZ array case mentioned previously, as the SZ implementation would (ideally) use the fast ldelema instruction, which is invalid for MD arrays.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Apr 27, 2020

Hey, thank you so much for taking the time to reply, I really appreciate it! 😊

It all makes way more sense now, in particular I was completely missing that point about generic methods not being inlined correctly in that case, that cleared things up quite a bit! And yes by "intrinsics" I was indeed referring to APIs in System.Runtime.Intrinsics (or System.Numerics too like Vector<T> or Vector4`), thanks for explaining what you refer to as "JIT intrinsics" as well!

I guess I have one more question related to these two things you said:

the remaining call sites should be for things that the JIT wouldn't have been able to optimize

it would make more sense to have a single Array overload that can handle any rank, any bounds

Wouldn't that cause a decent performance hit though? I mean, what I was thinking for this T[,] was to just ignore custom bounds checks and simply return literally a reference to the first item in the array object, no matter what index is actually supposed to be used to access it. Ideally it should just JIT down to this, assuming a 2D array with whatever bounds/size, on x64:

mov eax, [rcx]      ; null check
lea rax, [rcx+0x18] ; skip length, pad, HxW, bounds
ret

Essentially, what can already be achieved with:

public static ref T DangerousGetReference<T>(this T[,] array)
{
    var arrayData = Unsafe.As<RawArray2DData>(array);
    ref T r0 = ref Unsafe.As<byte, T>(ref arrayData.Data);

    return ref r0;
}

private sealed class RawArray2DData
{
    public IntPtr Length;
    public int Height;
    public int Width;
    public int HeightLowerBound;
    public int WidthLowerBound;
    public byte Data;
}

With the difference being that it would be baked in into the BCL, and possibly with the JIT intrinsics you mentioned so that it would also work well in R2R scenarios. I mean, considering that this would be a very performance-oriented API, I'd argue that devs using this would also very likely not be using custom bounds at all in the first place, and even if they were, they should just have to handle them manually after retrieving the reference. This would also make the behavior consistent with T[].

Curious to know what you guys think, thanks again for your time! 😄

@GrabYourPitchforks
Copy link
Member

Sure, but the following assembly would work for any-rank MD array:

; assume rcx := obj reference
mov rdx, qword ptr [rcx] ; rdx := pMethodTable, also handles null check
mov edx, dword ptr [rdx + offset_to_basesize] ; edx := pMethodTable->BaseSize
lea rax, [rcx + rdx + some_const_offset] ; rax := addr of where first element in MD array would be
ret

Given the relative rarity of MD arrays in general when compared to SZ arrays, I don't think it's worth optimizing away a single instruction for the rank-2 MD array case.

@Sergio0694
Copy link
Contributor Author

I see. I wonder what the performance hit could be though in scenarios where this API was used in a tight loop, as that second instruction involves yet another memory access 🤔

Just my two cents, but I often had the feeling that the low usage of ND arrays (especially 2D arrays) was sort of the chicken and the egg problem. By that I mean, they're not so commonly used, so they get low support. The codegen for them wasn't as optimized as it could've been (see #35293), and as an example, when the Span<T> and Memory<T> APIs were added, 2D arrays got no love whatsoever (eg. there's no intuitive way to get a Span<T> out of a 2D array, and it's just impossible to do for Memory<T>). As a result, users might move away from them (I did too), which gets us back to square 1 again.

I do think that at least for the 2D case, it might be nice to have some dedicated, optimized APIs. There are many scenarios (eg. image processing) where they would be a very user friendly data structure to use, as opposed to SZ arrays with manual offsetting, or even worse, jagged arrays (with all their added bonus of much higher page faults thanks to their discontinuous memory layout) 😄

I'm glad I could start a conversation around this!

@GrabYourPitchforks
Copy link
Member

The API proposal suggested here is for the absolute lowest of low level consumers. The 0.00001% audience. By definition it's not friendly since it drops into unsafe code and assumes expert knowledge by its callers. Additionally, it's not really expected that people will call this in a tight loop. If you look at existing consumers, they often call this before entering a tight loop, but it's not within the loop body itself. Unless you're in the habit of accessing element 0 over and over and over again, I suppose. :)

If the goal were instead to make MD arrays more user-friendly generally, including giving them better codegen for common scenarios, that's laudable! But I think the issue #35293 that you had already pointed to is the right place for that discussion.

@Sergio0694
Copy link
Contributor Author

Absolutely, I agree that an API like this (same as with the existing MemoryMarshal.GetArrayDataReference or MemoryMarshal.GetReference) would typically be called just before tight loops, and then the loop themselves would just do offsetting on the returned reference. What I was thinking here was cases where that's not possible, eg. some custom type that wraps an array (or a span) and provides non-sequential access to the underlying memory. Just to make an example, consider this type:

public readonly struct ArrayColumn<T>
{
    private readonly T[,] array;
    private readonly int height;
    private readonly int width;
    private readonly int column;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public ArrayColumn(T[,] array, int column)
    {
        // Assign fields...
    }

    public ref T this[int y]
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get
        {
            if ((uint)y >= (uint)this.height)
            {
                ThrowHelper.ThrowArgumentOutOfRangeExceptionForInvalidColumn();
            }

            ref T r0 = ref MemoryMarshal.GetArrayDataReference(this.array);
            ref T r1 = ref Unsafe.Add(ref r0, this.column + y * this.width);

            return ref r1;
        }
    }
}

This is just to make an example, but you can see how in a situation like this, caching the initial reference would not be possible for users (without going through extra hoops and handling the offsetting manually, which defeats the point of having a helper type like this), while having access to a fast API like MemoryMarshal.GetArrayDataReference can give a nice performance boost whenever the accessor is used, in particular because it lets you handle the bounds checks on your own (so that you can only check the vertical axis in this case), and it lets you ignore the lower bounds of the array if you assume the array is using a 0-based index. You could imagine a user accessing elements of such a type within a tight loop, just as they would do with a Span<T> or normal array 😄

I think having APIs like these would be useful for library authors to provide more efficient types that can be used by general developers that are not expected to know how to use any of these unsafe APIs.

Again, this was just to clarify what I meant about this API also being used directly in tight loops. And I agree that reusing the same method with the added lookup to the method table would simplify the codebase and avoid additional overloads, I was just wondering about the performance hit of the second memory lookup in cases where caching the reference and reuse it directly is not an option.

On this point, personally I think it'd be nice to have dedicated, more efficient overloads at least for 2D arrays (if not 3D arrays as well) with the base size being baked in, and then have a general purpose API only targeting Array with the code you described to handle other arrays of higher ranks.

Again, I really appreciate you taking the time to have this conversation 😊

@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@joperezr joperezr added this to the Future milestone Jul 7, 2020
@joperezr joperezr added the design-discussion Ongoing discussion about design without consensus label Jul 7, 2020
@Sergio0694
Copy link
Contributor Author

Pinging @bartonjs and @joperezr as area owners, was wondering about the necessary steps to move this to ready for review.

I'm absolutely fine with @GrabYourPitchforks's idea for how to make the implementation easier to share across the various overloads, and personally I wouldn't mind even a single overload just taking Array, even though that wouldn't be able to validate the correct usage for the type parameter (as in, people could do GetArrayDataReference<int>(new string[0])). Considering the API is already in System.Runtime.InteropServices though which is explicitly considered "unsafe", that might not necessarily be a problem. I'm sure you'll both be able to provide more feedbacks/ideas on how to refine the API proposal though 😊

As for examples of usage, this would help a lot in the Microsoft.Toolkit.HighPerformance package, where I'm currently doing quite some hoops to work around the lack of this API (here), specifically by just relying on the .NET Core layout of ND arrays and doing a reference cast and then manually indexing as needed (which is technically not really safe though). Being able to just have a built-in API for this in the BCL would make the whole thing much more reliable and easier to maintain, and much less error prone. In general, since we do have the API for 1D arrays now, I figured it would just make sense to expand it to work on all arrays as well 🚀

@GrabYourPitchforks
Copy link
Member

No need for the method to be generic. It could return ref byte, for instance. That's what our internal helper methods do.

internal static ref byte GetRawData(this object obj) =>
ref Unsafe.As<RawData>(obj).Data;

A ref byte would be immediately useful for pinning purposes or comparison Unsafe.IsAddressLessThan/GreaterThan purposes. And if you need to convert the ref to a different data type, there's always Unsafe.As.

That would make the public API:

public static class MemoryMarshal
{
    // strawman
    public static ref byte GetMDArrayDataRef(Array arr);
}

@Sergio0694
Copy link
Contributor Author

Personally, I would be fine with that too, since the important bit would just be to have a portable API to reliably get a ref to the start of an ND array on any runtime. But, doesn't that API definition get really close to GetRawData (#28001), just with a slightly more restricted parameter type? I mean if we want to go with such a dangerous API (as in, users would need to manually Unsafe.As the reference as you said), I believe @jkotas said he didn't like the return ref type being non specific (see his comment here).

Please correct me if I'm wrong, from what I can tell here we basically have:

  • GetMDArrayDataRef just returning ref byte, possibly too low-level, also at that point why not just do GetRawData directly?
  • If we want a ref T return, we either have to:
    • Only take an Array, but we then can't validate the input array type against T at build time
    • If we wanted to validate that at runtime, that'd introduce a branch and more overhead
    • Just add a few generic overloads specifically for eg. 2D, 3D arrays, taking T[,] and returning a proper ref T

My initial proposal was like this last point, for consistency with ref T GetArrayDataReference<T>(T[]).
Of course, just thinking out loud on the possible options here, as I said I'd personally be more than happy with any of these solutions (excluding the one with internal branch to validate the type at runtime), as long as we'll have a way on .NET 6 to reliably get an array ref that will work on both CoreCLR, Mono or anything else. Right now I basically had to fallback to an equivalent workaround to that used by the original Span<T> in CoreFX, to avoid having my code explode on Mono .NET 5 due to the different ND array layout, and that's not really ideal 😅

@jkotas
Copy link
Member

jkotas commented Oct 10, 2020

also at that point why not just do GetRawData directly?

The proposed RuntimeHelpers.GetRawData would not return reference to the first element in this case. It would return reference to block of dimensions. We do not document the structure of this block and the structure of this block varies between runtimes. For example, the structure of this block is different between CoreCLR and Mono. This is example of a problem we would get with a public GetRawData

Ideally, we would make this return ref void, but that's not allowed in C# today.

Note that Marshal.UnsafeAddrOfPinnedArrayElement is an existing API that does this already, except that it expects pinned array and takes an extra index.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Oct 10, 2020

The proposed RuntimeHelpers.GetRawData would not return reference to the first element in this case. It would return reference to block of dimensions.

Ooh, I see, Wait, should Michal edit the first post of his issue then, or did I read that wrong? He says:

Given an object instance o, this would return a managed reference to the first field of the type (or first element of an array if o is an array, or first character of a string if o is string).

That in bold is the bit that led me to assume that API would've worked the same in this scenario.

As for Levi's proposal, in case returning ref byte would be acceptable for a public API then that looks good 😄

I wasn't aware of Marshal.UnsafeAddrOfPinnedArrayElement, that looks interesting! Though it requires pinning which isn't ideal. I guess the new proposed API would essentially be identical to that, just without the explicit null check (or at least using a throw helper to help the codegen) and always using refs so that you don't need pinning? 🤔

Side point, for my specific use case which is restricted to 2D and 3D arrays instead, I think I can just keep using the portable approach for now (as in, before this API possibly ships with .NET 6)? As in, basically the equivalent of what the OG Span<T> did back in CoreFX (precomputing the byte offset in some beforefieldinit TypeInfo<T> class, with some static readonly fields), which requires no pinning and should still be super fast, especially when using tiered JIT as I'd expect the tier 1 JIT to be able to just hardcode the offset at some point and possibly skip the static field read entirely (or at the very least skip the check for the static constructor).

EDIT: nevermind, I just realized I couldn't have used that API anyway since I need to work on arrays of managed types too, and I don't think those can actually be pinned, without possibly even more hacks.

@jkotas
Copy link
Member

jkotas commented Oct 10, 2020

or first element of an array if o is an array, or first character of a string if o is string

I assumed that the public API would do what the internal GetRawData does. The internal GetRawData does not do this.

@Sergio0694
Copy link
Contributor Author

Right, I see. I think then we should either update the description in #28001 if that doesn't reflect the actual behavior that that proposed API would have, or otherwise decide that the public API would behave as currently described, in which case that would basically replace the proposed API in this issue, as they'd essentially be identical when used on arrays? 🤔

@GrabYourPitchforks GrabYourPitchforks added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed design-discussion Ongoing discussion about design without consensus api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Apr 14, 2021
@GrabYourPitchforks
Copy link
Member

Spoke with Sergio offline and he said taking Array would work for his scenarios. I'll move this to ready-for-review.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Apr 15, 2021

Just to clarify, would we want to keep the generic parameters and technically allow consumers to break type safety when getting a reference (since they could just pass whatever type argument and get the returned reference reinterpreted to it), or just make the returned reference a ref byte at this point? Though I think @jkotas mentioned he didn't like having public APIs returning a ref byte like that due to that being even more dangerous to use 🤔

Having the T[,] and T[,,] overloads was also supposed to help with this and effectively enforce type safety to a degree (at least somewhat, given that array covariance is still not checked so one could technically use this API to get eg. a mutable ref object from an item in a string[] array, which breaks everything on writes), but as Levi mentioned I agree that those would add more complexity just to remove a single mov in an already extremely niche API.

So in general, what do we want the final signature of this API to be? 🙂

@GrabYourPitchforks
Copy link
Member

If desired, we could force the user to pass a generic argument, even though the method would still take just Array. It'd do the equivalent of Unsafe.As under the covers. But given that we're already in unsafe territory, I'm not too concerned about returning ref byte or other unsafe constructs. The use case for this API already assumes the consumer is going to turn around and call Unsafe.Add and friends, so they're already in footgun territory anyway.

@bartonjs
Copy link
Member

bartonjs commented May 27, 2021

Video

Approved as a System.Array-based overload of GetArrayDataReference.

There's an open question of what happens if the array is an array of reference types: is byte the right answer for the return type?

namespace System.Runtime.InteropServices
{
    partial class MemoryMarshal
    {
        public static ref byte GetArrayDataReference(Array arr);
    }
}

@bartonjs bartonjs removed the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 27, 2021
@bartonjs bartonjs added the api-approved API was approved in API review, it can be implemented label May 27, 2021
@tannergooding
Copy link
Member

@jkotas, could you weigh in on the concern raised?

Specifically is ref byte GetArrayDataReference(Array arr) going to be problematic for reference types (say object[,])? Likewise is it supporting any array and therefore needing to check SZ vs MD going to be problematic?
On the first, if so, would ref T GetArrayDataReference<T>(Array arr) "better" (in that it combines the get and the Unsafe.As<TFrom, TTo() call)?

@jkotas
Copy link
Member

jkotas commented May 27, 2021

ref byte GetArrayDataReference(Array arr) going to be problematic for reference types (say object[,])?

I do not see a problem with it. It is super unsafe API though.

ref T GetArrayDataReference<T>(Array arr)

If we changed the API to return ref T, I think we would want to also do a type check inside the API to validate that the element type is T. It would make the API safer, but also slower.

@tannergooding
Copy link
Member

Thanks!

I don't have a particular preference here but might lean towards the little additional safety. I'd think it shouldn't be problematic in practice as the cost of a type check (particularly if we are already going to check SZ vs MD) seems negligible compared to ref arr[0, 0, 0, 0, 0] which might have to do many many bounds checks.

It also seems like something that could be "statically" determined in a few cases, since you will often have a T[,] and not an actual Array, so it might be something we could intrinsify and "improve" if it did add measurable cost.

I'll defer to what the runtime team feels is the correct approach, however.

@GrabYourPitchforks
Copy link
Member

Getting the address of the first element for any arbitrary array (including an MDArray) is trivial and shouldn't add measurable overhead. See #35528 (comment) for more info.

If we changed the API to return ref T, I think we would want to also do a type check inside the API to validate that the element type is T. It would make the API safer, but also slower.

It would make the API safer. However, we do not perform this same check for GetArrayDataReference<T>(T[]). That API is explicitly documented as "you'd better ensure the array is of the correct type, because we don't perform array variance validity checks on your behalf."

@GrabYourPitchforks GrabYourPitchforks self-assigned this May 27, 2021
@jkotas
Copy link
Member

jkotas commented May 27, 2021

Making the API safer will make it also less usable. You won't be able to call it unless you have the T available.

We actually have very similar APIs available already Marshal.UnsafeAddrOfPinnedArrayElement(Array arr, int index). This API does not do any type checks, but requires pinning and does a bit extra stuff that the people writing low-level unsafe code do not appreciate. This API is fix for that. So I think we should not do any type checks in this API either so it is a strict functional superset of Marshal.UnsafeAddrOfPinnedArrayElement.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 1, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants