Xnn f32 reduce window #7192

LokeshReddyOVS-MCW · 2024-09-26T09:00:58Z

No description provided.

…or rwsum

…for rwdsum

dsharlet

I only looked at microkernels so far

dsharlet · 2024-09-27T08:25:39Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+
+  int padded_size = rows + MAX((rows - 1),0) * (base_dilation - 1) + padding[0] + padding[1];
+  int output_size = (padded_size < (window_dimensions - 1) * window_dilations + 1) ? 
+                    0 : FLOOR((padded_size - (window_dimensions - 1) * window_dilations - 1) / (float)window_strides) + 1;


This is going to convert the size to a float, which could drop some precision, which is a major problem for size/shape values. I think this needs to be done with integer arithmetic.

dsharlet · 2024-09-27T08:26:42Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+  int output_size = (padded_size < (window_dimensions - 1) * window_dilations + 1) ? 
+                    0 : FLOOR((padded_size - (window_dimensions - 1) * window_dilations - 1) / (float)window_strides) + 1;
+
+  // replaced modulo and division by multiplicative scaled inverse


This is an approximation, isn't this a problem for something like this? Similar to float comment above: approximation for the "values" of buffers is usually OK, but not for the "shape" parameters.

dsharlet · 2024-09-27T08:28:49Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+        float sum = init_value;
+        int window_start = i * window_strides;
+        int pad_high_boundary = CEIL((((padding[0] - window_start) * inverse_win_dilation) >> 32));
+        int pad_low_boundary = CEIL((((padded_size - padding[1] - window_start) * inverse_win_dilation) >> 32));


Isn't the argument to CEIL here an integer? I think you need something more like: (((padded_size - padding[1] - window_start) * inverse_win_dilation + ((1 << 32) - 1)) >> 32)

But of course, this assumes it's OK to approximate this division in the first place (see above comment).

dsharlet · 2024-09-27T08:33:45Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+#include "xnnpack/common.h"
+#include "xnnpack/reduce.h"
+
+#define CEILING_POS(X) ((X-(int)(X)) > 0 ? (int)(X+1) : (int)(X))


These macros should be helper functions in math.h. I think we already have some of the functions you need, e.g. divide_round_up (we shouldn't divide first and then try to compute the ceiling of the result, that approach assumes approximating the result with float or fixed point arithmetic).

dsharlet · 2024-09-27T08:34:36Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+#define FLOORING_POS(X) (int)(X)
+#define FLOORING_NEG(X) ((X-(int)(X)) > 0 ? (int)(X-1) : (int)(X))
+#define FLOOR(X) ( ((X) > 0) ? FLOORING_POS(X) : FLOORING_NEG(X) )
+#define MAX(X,Y) (X > Y ? X : Y)


Use functions in math.h for these

dsharlet · 2024-09-27T08:35:39Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+    const float* input,
+    float init_value,
+    int* padding, 
+    int base_dilation, 


Do these need to be int, i.e. signed? If we can assume they are positive, a lot of the necessary arithmetic gets simpler (e.g. divide_round_up is pretty simple for size_t, not so much for int).

Agree that you should use size_t, and if you need a signed value, use a more specific type, e.g. int32_t.

dsharlet · 2024-09-27T08:38:15Z

test/f32-rwdsum.yaml

+# LICENSE file in the root directory of this source tree.
+
+# SCALAR
+- name: xnn_f32_rwdsum_ukernel_1p1x__scalar_c1


Please don't add new yaml files, and instead try to use this new system: https://github.com/google/XNNPACK/blob/master/doc/microkernel-enumerators.md, e.g. https://github.com/google/XNNPACK/blob/master/src/f16-vabs/f16-vabs.h.

We are working on converting existing yamls to such headers.

dsharlet · 2024-09-27T08:55:09Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+  assert(input != NULL);
+  assert(output != NULL);
+
+  int padded_size = rows + MAX((rows - 1),0) * (base_dilation - 1) + padding[0] + padding[1];


The mixing of int and size_t here is concerning, I think there's a lot of risk of integer overflow, especially in the approximated division below which will multiply values that should be size_t with very large fixed point reciprocals (2^32 if dilation/stride is 1).

gonnet · 2024-09-27T09:49:55Z

bench/BUILD.bazel

+[xnnpack_benchmark(
+    name = "%s_bench" % kernel,
+    srcs = [
+        "%s.cc" % kernel.replace("_", "-"),
+        "rsum-benchmark.h",
+        "rw-benchmark.h",
+    ],
+    deps = MICROKERNEL_BENCHMARK_DEPS,
+) for kernel in [
+    "f32_rwsum",
+]]


Since this is a single target it does not need to be in a list expansion.

gonnet · 2024-09-27T09:51:15Z

bench/f32-rwsum.cc

+#include "bench/rw-benchmark.h"
+#include "bench/rsum-benchmark.h"
+#include "bench/utils.h"
+#include <benchmark/benchmark.h>
+
+#include "xnnpack.h"
+#include "xnnpack/aligned-allocator.h"
+#include "xnnpack/common.h"
+#include "xnnpack/reduce.h"
+#include "xnnpack/microfnptr.h"
+#include "xnnpack/microparams-init.h"


Please sort the includes correctly, i.e. system headers first, then #include <...>, then the #include "...", each group sorted alphabetically.

gonnet · 2024-09-27T09:54:17Z

bench/rw-benchmark.h

+#include "bench/rw-benchmark.h"
+#include "bench/utils.h"
+#include <benchmark/benchmark.h>
+
+#include "xnnpack.h"
+#include "xnnpack/aligned-allocator.h"
+#include "xnnpack/common.h"
+#include "xnnpack/reduce.h"
+#include "xnnpack/microfnptr.h"


Please sort as described above.

gonnet · 2024-09-27T09:55:02Z

bench/rw-benchmark.h

+    state.counters["cpufreq"] = cpu_frequency;
+  }
+}
+


Please remove extra space.

gonnet · 2024-09-27T09:56:49Z

bench/rw-benchmark.h

+
+namespace {
+
+void f32_rwsum(


These functions are not templated, so this should actually be it's own build target, with a .cc and a .h file.

gonnet · 2024-09-27T09:58:59Z

src/f32-rwsum/f32-rwdsum-1p1x-scalar-c1.c

+    const float* input,
+    float init_value,
+    int* padding, 
+    int base_dilation, 


Agree that you should use size_t, and if you need a signed value, use a more specific type, e.g. int32_t.

LokeshReddyOVS-MCW and others added 8 commits September 26, 2024 14:29

Add reduce_window f32 scalar microkernels

7881cf1

Add f32 scalar rw benchmark file

86e066f

Cmake changes for reduce window bench

7c52eb8

efficient modulo and division changes

582f3a6

add Loop boundary optimization with multiplicative inverse approach f…

a79abd5

…or rwsum

add Loop boundary optimization with multiplicative inverse approach …

dd4dc93

…for rwdsum

Update Variable Names

5cad8bc

fix: xnn_f32_default_params changes in rwsum, variable name updation

71d799f

LokeshReddyOVS-MCW marked this pull request as ready for review September 26, 2024 12:19

dsharlet reviewed Sep 27, 2024

View reviewed changes

gonnet reviewed Sep 27, 2024

View reviewed changes

vishalchaudharymcw and others added 13 commits October 10, 2024 17:43

feat: optimized reduce window lcm approach

6979429

windows compilation fix and reording code.

21f6702

feat: const correctness changes

f6616f6

removed yaml files and added structural part for test

f421dfe

feat: added loop unrolling for rwsum

9320c7d

added loop unrolling for rwdsum

b41c319

feat: updated variable names in rwsum unroll versions

7383c62

added structural part for bench

1c0e626

cosmetic changes in reduce window microkernels

a2723a1

microkernel tester filename changes for reduce-window-d files

c56dae2

chore: updated copyright year

261c51b

Merge remote-tracking branch 'origin/master' into xnn_f32_reduce_window

9efe760

fix: test and benchmark files

af6f7d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xnn f32 reduce window #7192

Xnn f32 reduce window #7192

LokeshReddyOVS-MCW commented Sep 26, 2024

dsharlet left a comment

dsharlet Sep 27, 2024

dsharlet Sep 27, 2024

dsharlet Sep 27, 2024

dsharlet Sep 27, 2024

dsharlet Sep 27, 2024

dsharlet Sep 27, 2024

gonnet Sep 27, 2024

dsharlet Sep 27, 2024

dsharlet Sep 27, 2024

gonnet Sep 27, 2024

gonnet Sep 27, 2024

gonnet Sep 27, 2024

gonnet Sep 27, 2024

gonnet Sep 27, 2024

gonnet Sep 27, 2024


		namespace {

		void f32_rwsum(

Xnn f32 reduce window #7192

Are you sure you want to change the base?

Xnn f32 reduce window #7192

Conversation

LokeshReddyOVS-MCW commented Sep 26, 2024

dsharlet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment