Simplifying fallback kernels #303

manodeep · 2023-08-23T05:50:39Z

Reduced the amount of code in the fallback kernels. At least on my M2 laptop, it runs faster - slightly faster (5-10%) for DD-type (i.e. small number-density) and significantly (~20-25%) faster for RR-type calculations

…th arrays

manodeep · 2023-08-29T00:39:03Z

@lgarrison Didn't realise I hadn't requested a review -- oops!

manodeep · 2023-08-29T01:01:27Z

Ohh forgot to mention that I ran the INTEGRATION_TESTS for this change and the exhaustive tests passed

lgarrison

All looks fine to me! Did you do any tests to figure out where the speedup is coming from?

mocks/DDrppi_mocks/countpairs_rp_pi_mocks_kernels.c.src

lgarrison · 2023-08-29T14:46:54Z

mocks/DDrppi_mocks/countpairs_rp_pi_mocks_kernels.c.src

-                    npairs[ibin]++;
-                    if(need_rpavg) {
-                        rpavg[ibin]+=rp;
+                    src_npairs[ibin]++;


I guess the idea here is that there's no point in making stack buffers, since the passed buffers are already local to the current thread?

Yup. I am also considering whether it would be worthwhile to move to a malloc'ed buffer rather than the stack (but there may be side-effects of false sharing under OpenMP with such a malloc'ed src_npairs[nthreads][nbins] kind of matrix)

manodeep · 2023-08-30T01:53:06Z

Didn't actually do a line-by-line comparison timer. Will attempt to do that on my laptop; plus, I will also check that the runtime is not adversely affected on our local linux supercomputer (Skylake and AMD EPYC)

manodeep · 2023-08-30T11:06:50Z

Timed the tests on the master and this branch on Skylake cpus - essentially no difference for theory but the mocks, specifically DDtheta is faster.

Timed the tests on master vs this branch on EPYC cpus - same as above difference in theory but slightly faster with the simplified kernels (but smaller improvements compared to SKX). In general, both branches run faster on EPYC compared to SKX.

manodeep · 2023-09-21T12:11:10Z

Totally forgot to merge this PR!

manodeep · 2024-05-02T23:04:15Z

Commenting to add the link to the original #296 that spurred this work

manodeep added 2 commits August 23, 2023 15:44

Re-worked the fallback functions to reduce if-conds and variable-leng…

0ff3839

…th arrays

Renamed rpavg to thetaavg for DDtheta_mocks

cd35845

manodeep mentioned this pull request Aug 23, 2023

DDtheta: numerical stability of small angular separation in float32 #296

Open

manodeep requested a review from lgarrison August 29, 2023 00:38

Added to Changelog [ci skip]

496c99e

lgarrison approved these changes Aug 29, 2023

View reviewed changes

manodeep merged commit e72cf70 into master Sep 21, 2023

manodeep deleted the simplify_fallback branch September 21, 2023 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplifying fallback kernels #303

Simplifying fallback kernels #303

manodeep commented Aug 23, 2023

manodeep commented Aug 29, 2023

manodeep commented Aug 29, 2023

lgarrison left a comment

lgarrison Aug 29, 2023

manodeep Aug 30, 2023

manodeep commented Aug 30, 2023

manodeep commented Aug 30, 2023

manodeep commented Sep 21, 2023

manodeep commented May 2, 2024

Simplifying fallback kernels #303

Simplifying fallback kernels #303

Conversation

manodeep commented Aug 23, 2023

manodeep commented Aug 29, 2023

manodeep commented Aug 29, 2023

lgarrison left a comment

Choose a reason for hiding this comment

lgarrison Aug 29, 2023

Choose a reason for hiding this comment

manodeep Aug 30, 2023

Choose a reason for hiding this comment

manodeep commented Aug 30, 2023

manodeep commented Aug 30, 2023

manodeep commented Sep 21, 2023

manodeep commented May 2, 2024