Fix stride indexing bugs in `reorg` and `reorg_gradient` functions (CPU & CUDA) #3012

Cydral · 2024-09-16T12:59:40Z

Overview

This Pull Request addresses and resolves identified bugs in the reorg and reorg_gradient utility functions within the Dlib library, ensuring accurate stride handling in both CPU and CUDA implementations. Additionally, a new bool add_to parameter has been introduced to enhance the flexibility and reversibility of these functions for future layer integrations.

Problem statement

Stride miscalculations: The original reorg and reorg_gradient functions contained incorrect stride calculations, particularly in the handling of row_stride and col_stride. This led to improper index mapping, causing erroneous data reorganization and gradient accumulation.
Consistency issues: Discrepancies between CPU and CUDA versions resulted in inconsistent behavior across different execution environments.

Changes made

Corrected stride assertions
Fixed index calculations
Added bool add_to parameter: Introduced an optional add_to parameter in reorg and reorg_gradient functions to allow reversible operations and accumulation flexibility in future layer usages.

Example test case:

Input tensor (reorg_input):
num_samples=1, k=1, nr=4, nc=4

[[         0.396    0.541    0.809    0.935
           0.108    0.671    0.138    0.775
           0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646
]]

Original reorg_output:

[[         0.396    0.809
           0.485    0.192
][         0.541    0.935
           0.362    0.219
][         0.108    0.138
           0.290    0.876
][         0.671    0.775
           0.232    0.646
]]

Corrected reorg_output: Matches the expected reorganized output exactly.

[[         0.396    0.809
           0.485    0.192
][         0.541    0.935
           0.362    0.219
][         0.108    0.138
           0.290    0.876
][         0.671    0.775
           0.232    0.646
]]

reorg_gradient2_output: Successfully reverses the reorg operation, reconstructing the original tensor.

[[         0.396    0.541    0.809    0.935
           0.108    0.671    0.138    0.775
           0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646
]]

…PU & CUDA) and Add `add_to` Parameter

arrufat · 2024-09-16T13:22:17Z

Oh, I will check it in more detail later on. But is it possible that it has worked for me because I've always used this with square images?

arrufat · 2024-09-16T13:34:47Z

The original and corrected output look identical to me.

Cydral · 2024-09-16T14:31:45Z

Hi @arrufat,

I discovered this issue while searching for a method to split tensor columns in a 2D matrix mode. Upon closer examination of the example I provided, it's apparent that the function initially works correctly, but the very last row seems to be derived from memory artifacts rather than the actual matrix. Essentially, this indicates that the indices were incorrect.

While I haven't tested all possible configurations of this class, the new indexed calculation appears to be correct now for multi-channel, multi-row, and multi-column scenarios. This correction ensures that the reorganization process accurately maps input tensor elements to their correct positions in the output tensor.

An additional advantage introduced by this update is the optional accumulation of values. This feature allows for direct reversibility of the functions without requiring additional processes.

I hope that these modifications have effectively corrected these utility functions... assuming they were indeed erroneous in their previous implementation... I'll let you confirm that please, as I don't have any examples of use for these specific layers.

Cydral · 2024-09-16T14:39:40Z

Upon re-reading the content of the Pull Request, I better understand your inquiry. You're correct; the initial calculation assumed square matrices but didn't cover the general case. Let me provide a more illustrative example that better demonstrates the problem:

reorg_input: num_samples=1, k=1, nr=4, nc=6
[[         0.396    0.541    0.809    0.935    0.108    0.671
           0.138    0.775    0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646    0.400    0.518
           0.622    0.079    0.499    0.193    0.945    0.458
        ]]
reorg_output: num_samples=1, k=2, nr=4, nc=3
[[         0.396    0.809    0.108
           0.138    0.485    0.192
           0.290    0.876    0.400
           0.622    0.499    0.945
        ][         0.775    0.362    0.219
           0.232    0.646    0.518
           0.079    0.193    0.458
           0.000   -0.000    0.000
        ]]
reorg2_output: num_samples=1, k=2, nr=4, nc=3
[[         0.396    0.809    0.108
           0.138    0.485    0.192
           0.290    0.876    0.400
           0.622    0.499    0.945
        ][         0.541    0.935    0.671
           0.775    0.362    0.219
           0.232    0.646    0.518
           0.079    0.193    0.458
        ]]
reorg_gradient2_output: num_samples=1, k=1, nr=4, nc=6
[[         0.396    0.541    0.809    0.935    0.108    0.671
           0.138    0.775    0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646    0.400    0.518
           0.622    0.079    0.499    0.193    0.945    0.458
        ]]
test passed: reorg_cpu: max(abs(mat(grad_cpu) - mat(input))) < 1e-5

arrufat · 2024-09-16T21:48:18Z

Yes, your proposed fix makes sense, I should've noticed while porting this from Darknet, which only supports "square" strides, not different ones for rows and columns, that's why it was working in the tests, too, because we were only testing reorg<2, 2>

* remove using namespace std from headers * more std:: * more std:: * more std:: on windows stuff * remove uses of using namespace std::chrono * do not use C++17 features * Add Davis suggestion * revert some more stuff * revert removing include * more std::chrono stuff

davisking · 2024-09-23T11:44:16Z

Sweet, thanks for the bug fix :)

Fix Stride Indexing Bugs in reorg and reorg_gradient Functions (C…

4d698fc

…PU & CUDA) and Add `add_to` Parameter

Cydral and others added 5 commits September 20, 2024 19:43

'add_to' parameter missing in cuda call reorg_gradient.launch_kernel()

1d73b6c

Merge branch 'refs/heads/master' into Cydral-master

724ec09

fix build error

4dca9b2

Adjust comment formatting to be like other dlib comments

2f68a11

davisking merged commit 72822fe into davisking:master Sep 23, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stride indexing bugs in `reorg` and `reorg_gradient` functions (CPU & CUDA) #3012

Fix stride indexing bugs in `reorg` and `reorg_gradient` functions (CPU & CUDA) #3012

Cydral commented Sep 16, 2024

arrufat commented Sep 16, 2024

arrufat commented Sep 16, 2024

Cydral commented Sep 16, 2024

Cydral commented Sep 16, 2024

arrufat commented Sep 16, 2024

davisking commented Sep 23, 2024

Fix stride indexing bugs in reorg and reorg_gradient functions (CPU & CUDA) #3012

Fix stride indexing bugs in reorg and reorg_gradient functions (CPU & CUDA) #3012

Conversation

Cydral commented Sep 16, 2024

Overview

Problem statement

Changes made

Example test case:

arrufat commented Sep 16, 2024

arrufat commented Sep 16, 2024

Cydral commented Sep 16, 2024

Cydral commented Sep 16, 2024

arrufat commented Sep 16, 2024

davisking commented Sep 23, 2024

Fix stride indexing bugs in `reorg` and `reorg_gradient` functions (CPU & CUDA) #3012

Fix stride indexing bugs in `reorg` and `reorg_gradient` functions (CPU & CUDA) #3012