-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow use of 32 element tuples without dynamic allocation #40468
Conversation
Thanks @kshyatt , your PR fixes the problem for me. Julia 1.6julia> using BenchmarkTools
julia> ci = CartesianIndices((fill(2, 20)...,));
julia> @benchmark $ci[35]
BenchmarkTools.Trial:
memory estimate: 1.86 KiB
allocs estimate: 31
--------------
minimum time: 1.617 μs (0.00% GC)
median time: 1.666 μs (0.00% GC)
mean time: 1.807 μs (1.25% GC)
maximum time: 60.471 μs (95.66% GC)
--------------
samples: 10000
evals/sample: 10
julia> ci0 = CartesianIndices((fill(2, 10)...,));
julia> @benchmark $ci0[35]
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 47.384 ns (0.00% GC)
median time: 48.379 ns (0.00% GC)
mean time: 49.057 ns (0.00% GC)
maximum time: 74.505 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 988 This branchjulia> using BenchmarkTools
julia> ci = CartesianIndices((fill(2, 20)...,));
julia> @benchmark $ci[35]
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 124.371 ns (0.00% GC)
median time: 125.541 ns (0.00% GC)
mean time: 127.415 ns (0.00% GC)
maximum time: 245.769 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 898
julia> ci0 = CartesianIndices((fill(2, 10)...,));
julia> @benchmark $ci0[35]
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 46.852 ns (0.00% GC)
median time: 47.403 ns (0.00% GC)
mean time: 48.413 ns (0.00% GC)
maximum time: 111.554 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 988 |
Does this PR have notable TTFP implications? |
Honestly not sure. Hopefully the extension from 16-32 does not affect compilation too harshly but we should test this/get confirmation that it's unlikely to. |
Seems reasonable to me. 16 was pretty arbitrary anyway. I don't see any significant differences in build time, sysimg size, or TTFP. |
I thought I'd drawn a performance plot when adding this method in #39414, but it seems I must not have uploaded it. Could you run the original benchmark (#39175) for different sizes of |
Just clarify the number 32 a bit. 32 is a good number because a double precision tensor of rank 32 with size of each dimension 2 just fits the 32GB GPU. julia> 1<<32 * 8 / (1<<30)
32.0 This PR is to fix the problem of high order tensors can not be indexed on GPU kernels. @kshyatt and I encounters this problem independently during researches about tensor networks, and the number 32 will solve the issue for us. Also, the above benchmark aims to show the allocation rather than the speed up. @vtjnash I am glad to run the benchmark. But not quite understand what do you mean by the original benchmark and what is the julia> # on julia 1.7 (this PR)
@btime SymmetricGroup(20)
981.900 ns (39 allocations: 3.72 KiB)
julia> # on julia 1.6.0
@btime SymmetricGroup(20)
1.065 μs (39 allocations: 3.72 KiB) It seems not what we are looking for? |
With this PR:
on
|
@vtjnash ran the benchmarks and here are the results (all times on the y axis in microseconds) Sorry for the messed up labelling. The blue is |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG |
What do we think? Are the nanosoldier results concerning or can we go forward? |
I don't think I see anything changed. @nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @christopher-dG |
It looks like if nanosoldier is run twice on the same PR, the old result gets overwritten? I don't think it used to do that, did it? |
Yeah that's JuliaCI/Nanosoldier.jl#83. It used to link to a git hash, but that broke rewriting the NanosoldierReports history (which may need to happen once in a while because the repo just grows huge). |
Fixes JuliaGPU/GPUArrays.jl#333 and JuliaGPU/GPUArrays.jl#340 and restores behaviour compatible with #27398
cc @GiggleLiu, @maleadt