-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xnn f32 reduce window #7192
base: master
Are you sure you want to change the base?
Xnn f32 reduce window #7192
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at microkernels so far
|
||
int padded_size = rows + MAX((rows - 1),0) * (base_dilation - 1) + padding[0] + padding[1]; | ||
int output_size = (padded_size < (window_dimensions - 1) * window_dilations + 1) ? | ||
0 : FLOOR((padded_size - (window_dimensions - 1) * window_dilations - 1) / (float)window_strides) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to convert the size to a float, which could drop some precision, which is a major problem for size/shape values. I think this needs to be done with integer arithmetic.
int output_size = (padded_size < (window_dimensions - 1) * window_dilations + 1) ? | ||
0 : FLOOR((padded_size - (window_dimensions - 1) * window_dilations - 1) / (float)window_strides) + 1; | ||
|
||
// replaced modulo and division by multiplicative scaled inverse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an approximation, isn't this a problem for something like this? Similar to float comment above: approximation for the "values" of buffers is usually OK, but not for the "shape" parameters.
float sum = init_value; | ||
int window_start = i * window_strides; | ||
int pad_high_boundary = CEIL((((padding[0] - window_start) * inverse_win_dilation) >> 32)); | ||
int pad_low_boundary = CEIL((((padded_size - padding[1] - window_start) * inverse_win_dilation) >> 32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the argument to CEIL here an integer? I think you need something more like: (((padded_size - padding[1] - window_start) * inverse_win_dilation + ((1 << 32) - 1)) >> 32)
But of course, this assumes it's OK to approximate this division in the first place (see above comment).
#include "xnnpack/common.h" | ||
#include "xnnpack/reduce.h" | ||
|
||
#define CEILING_POS(X) ((X-(int)(X)) > 0 ? (int)(X+1) : (int)(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These macros should be helper functions in math.h. I think we already have some of the functions you need, e.g. divide_round_up
(we shouldn't divide first and then try to compute the ceiling of the result, that approach assumes approximating the result with float or fixed point arithmetic).
#define FLOORING_POS(X) (int)(X) | ||
#define FLOORING_NEG(X) ((X-(int)(X)) > 0 ? (int)(X-1) : (int)(X)) | ||
#define FLOOR(X) ( ((X) > 0) ? FLOORING_POS(X) : FLOORING_NEG(X) ) | ||
#define MAX(X,Y) (X > Y ? X : Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use functions in math.h for these
const float* input, | ||
float init_value, | ||
int* padding, | ||
int base_dilation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be int
, i.e. signed? If we can assume they are positive, a lot of the necessary arithmetic gets simpler (e.g. divide_round_up
is pretty simple for size_t
, not so much for int
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that you should use size_t
, and if you need a signed value, use a more specific type, e.g. int32_t
.
test/f32-rwdsum.yaml
Outdated
# LICENSE file in the root directory of this source tree. | ||
|
||
# SCALAR | ||
- name: xnn_f32_rwdsum_ukernel_1p1x__scalar_c1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add new yaml files, and instead try to use this new system: https://github.com/google/XNNPACK/blob/master/doc/microkernel-enumerators.md, e.g. https://github.com/google/XNNPACK/blob/master/src/f16-vabs/f16-vabs.h.
We are working on converting existing yamls to such headers.
assert(input != NULL); | ||
assert(output != NULL); | ||
|
||
int padded_size = rows + MAX((rows - 1),0) * (base_dilation - 1) + padding[0] + padding[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mixing of int and size_t here is concerning, I think there's a lot of risk of integer overflow, especially in the approximated division below which will multiply values that should be size_t with very large fixed point reciprocals (2^32 if dilation/stride is 1).
bench/BUILD.bazel
Outdated
[xnnpack_benchmark( | ||
name = "%s_bench" % kernel, | ||
srcs = [ | ||
"%s.cc" % kernel.replace("_", "-"), | ||
"rsum-benchmark.h", | ||
"rw-benchmark.h", | ||
], | ||
deps = MICROKERNEL_BENCHMARK_DEPS, | ||
) for kernel in [ | ||
"f32_rwsum", | ||
]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a single target it does not need to be in a list expansion.
bench/f32-rwsum.cc
Outdated
#include "bench/rw-benchmark.h" | ||
#include "bench/rsum-benchmark.h" | ||
#include "bench/utils.h" | ||
#include <benchmark/benchmark.h> | ||
|
||
#include "xnnpack.h" | ||
#include "xnnpack/aligned-allocator.h" | ||
#include "xnnpack/common.h" | ||
#include "xnnpack/reduce.h" | ||
#include "xnnpack/microfnptr.h" | ||
#include "xnnpack/microparams-init.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sort the includes correctly, i.e. system headers first, then #include <...>
, then the #include "..."
, each group sorted alphabetically.
bench/rw-benchmark.h
Outdated
#include "bench/rw-benchmark.h" | ||
#include "bench/utils.h" | ||
#include <benchmark/benchmark.h> | ||
|
||
#include "xnnpack.h" | ||
#include "xnnpack/aligned-allocator.h" | ||
#include "xnnpack/common.h" | ||
#include "xnnpack/reduce.h" | ||
#include "xnnpack/microfnptr.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sort as described above.
bench/rw-benchmark.h
Outdated
state.counters["cpufreq"] = cpu_frequency; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove extra space.
bench/rw-benchmark.h
Outdated
|
||
namespace { | ||
|
||
void f32_rwsum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are not templated, so this should actually be it's own build target, with a .cc
and a .h
file.
const float* input, | ||
float init_value, | ||
int* padding, | ||
int base_dilation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that you should use size_t
, and if you need a signed value, use a more specific type, e.g. int32_t
.
No description provided.