Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with sum over OffsetArray #329

Open
weymouth opened this issue Mar 27, 2023 · 1 comment
Open

Problem with sum over OffsetArray #329

weymouth opened this issue Mar 27, 2023 · 1 comment

Comments

@weymouth
Copy link

weymouth commented Mar 27, 2023

It's me again. That guy who keeps finding bugs when using sum. In Polyester, in KernelAbstractions, and now in OffsetArrays. Sigh...

Here is a case which slows down a loop by 25x(!) when using a sum over an OffsetArray. Expanding the sum into a for loop restores the normal performance when using Array.

## Set up
# Cartesian step
@inline δ(i, I::CartesianIndex{m}) where{m} = CartesianIndex(ntuple(j -> j==i ? 1 : 0, m))

# Finite difference
@inline ∂(a,I::CartesianIndex{m},u::AbstractArray{T,n}) where {T,n,m} = @inbounds u[I+δ(a,I),a]-u[I,a]

# Divergence using sum function
@fastmath @inline div_sum(I::CartesianIndex{m},u) where {m} = sum(@inbounds ∂(i,I,u) for i ∈ 1:m)

# Divergence expanded using for loop
@fastmath @inline function div_expanded(I::CartesianIndex{m},u) where {m} 
    init=zero(eltype(u))
    for i in 1:m
     init += @inbounds ∂(i,I,u)
    end
    return init
end

# loop over the range R
loop!(div,u,f,R) = @inbounds @simd for I ∈ R
    div[I] = f(I,u)
end

## Test case
# Benchmark with Arrays
N = (2^10+2,2^10+2) # size of the array including halo
R = CartesianIndices(map(n->n-2,N)).+CartesianIndex(map(n->1,N)) # range inside the halo
div = zeros(Float32,N);
u = rand(Float32,N...,length(N));
using BenchmarkTools
@btime loop!($div,$u,$div_sum,$R) # 188.200 μs (0 allocations: 0 bytes)
@btime loop!($div,$u,$div_expanded,$R) # 193.400 μs (0 allocations: 0 bytes)

# Benchmark with OA
using OffsetArrays: Origin
divOA = Origin(0)(div);
uOA = Origin((zeros(Int,length(N))...,1))(u);
ROA = CartesianIndices(map(n->n-2,N))
@btime loop!($divOA,$uOA,$div_expanded,$ROA) # 200.000 μs (0 allocations: 0 bytes)
@btime loop!($divOA,$uOA,$div_sum,$ROA) # 4.960 ms (0 allocations: 0 bytes) WTH!?!

@jishnub
Copy link
Member

jishnub commented Mar 31, 2023

Reduced to

julia> @btime mapfoldl(identity, +, ((1, I, $uOA) for I in CartesianIndices((1:10, 1:1))));
  54.481 ns (0 allocations: 0 bytes)

julia> @btime mapfoldl(identity, +, ((1, I, $u) for I in CartesianIndices((1:10, 1:1))));
  19.071 ns (0 allocations: 0 bytes)

It seems the iteration is slow

julia> @btime (u -> iterate((1, I, u) for I in CartesianIndices((1:10, 1:1))))($uOA);
  12.965 ns (0 allocations: 0 bytes)

julia> @btime (u -> iterate((1, I, u) for I in CartesianIndices((1:10, 1:1))))($u);
  3.365 ns (0 allocations: 0 bytes)

Oddly,

julia> @btime (1, CartesianIndex(1,1), $uOA);
  3.381 ns (0 allocations: 0 bytes)

julia> @btime (I -> (1, I, $uOA))(CartesianIndex(1,1));
  7.636 ns (0 allocations: 0 bytes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants