Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stride indexing bugs in reorg and reorg_gradient functions (CPU & CUDA) #3012

Merged
merged 6 commits into from
Sep 23, 2024

Conversation

Cydral
Copy link
Contributor

@Cydral Cydral commented Sep 16, 2024

Overview

This Pull Request addresses and resolves identified bugs in the reorg and reorg_gradient utility functions within the Dlib library, ensuring accurate stride handling in both CPU and CUDA implementations. Additionally, a new bool add_to parameter has been introduced to enhance the flexibility and reversibility of these functions for future layer integrations.

Problem statement

  • Stride miscalculations: The original reorg and reorg_gradient functions contained incorrect stride calculations, particularly in the handling of row_stride and col_stride. This led to improper index mapping, causing erroneous data reorganization and gradient accumulation.

  • Consistency issues: Discrepancies between CPU and CUDA versions resulted in inconsistent behavior across different execution environments.

Changes made

  1. Corrected stride assertions
  2. Fixed index calculations
  3. Added bool add_to parameter: Introduced an optional add_to parameter in reorg and reorg_gradient functions to allow reversible operations and accumulation flexibility in future layer usages.

Example test case:

  • Input tensor (reorg_input):
    num_samples=1, k=1, nr=4, nc=4
[[         0.396    0.541    0.809    0.935
           0.108    0.671    0.138    0.775
           0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646
]]
  • Original reorg_output:
[[         0.396    0.809
           0.485    0.192
][         0.541    0.935
           0.362    0.219
][         0.108    0.138
           0.290    0.876
][         0.671    0.775
           0.232    0.646
]]
  • Corrected reorg_output: Matches the expected reorganized output exactly.
[[         0.396    0.809
           0.485    0.192
][         0.541    0.935
           0.362    0.219
][         0.108    0.138
           0.290    0.876
][         0.671    0.775
           0.232    0.646
]]
  • reorg_gradient2_output: Successfully reverses the reorg operation, reconstructing the original tensor.
[[         0.396    0.541    0.809    0.935
           0.108    0.671    0.138    0.775
           0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646
]]

@arrufat
Copy link
Contributor

arrufat commented Sep 16, 2024

Oh, I will check it in more detail later on. But is it possible that it has worked for me because I've always used this with square images?

@arrufat
Copy link
Contributor

arrufat commented Sep 16, 2024

The original and corrected output look identical to me.

@Cydral
Copy link
Contributor Author

Cydral commented Sep 16, 2024

Hi @arrufat,

I discovered this issue while searching for a method to split tensor columns in a 2D matrix mode. Upon closer examination of the example I provided, it's apparent that the function initially works correctly, but the very last row seems to be derived from memory artifacts rather than the actual matrix. Essentially, this indicates that the indices were incorrect.

While I haven't tested all possible configurations of this class, the new indexed calculation appears to be correct now for multi-channel, multi-row, and multi-column scenarios. This correction ensures that the reorganization process accurately maps input tensor elements to their correct positions in the output tensor.

An additional advantage introduced by this update is the optional accumulation of values. This feature allows for direct reversibility of the functions without requiring additional processes.

I hope that these modifications have effectively corrected these utility functions... assuming they were indeed erroneous in their previous implementation... I'll let you confirm that please, as I don't have any examples of use for these specific layers.

@Cydral
Copy link
Contributor Author

Cydral commented Sep 16, 2024

Upon re-reading the content of the Pull Request, I better understand your inquiry. You're correct; the initial calculation assumed square matrices but didn't cover the general case. Let me provide a more illustrative example that better demonstrates the problem:

reorg_input: num_samples=1, k=1, nr=4, nc=6
[[         0.396    0.541    0.809    0.935    0.108    0.671
           0.138    0.775    0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646    0.400    0.518
           0.622    0.079    0.499    0.193    0.945    0.458
        ]]
reorg_output: num_samples=1, k=2, nr=4, nc=3
[[         0.396    0.809    0.108
           0.138    0.485    0.192
           0.290    0.876    0.400
           0.622    0.499    0.945
        ][         0.775    0.362    0.219
           0.232    0.646    0.518
           0.079    0.193    0.458
           0.000   -0.000    0.000
        ]]
reorg2_output: num_samples=1, k=2, nr=4, nc=3
[[         0.396    0.809    0.108
           0.138    0.485    0.192
           0.290    0.876    0.400
           0.622    0.499    0.945
        ][         0.541    0.935    0.671
           0.775    0.362    0.219
           0.232    0.646    0.518
           0.079    0.193    0.458
        ]]
reorg_gradient2_output: num_samples=1, k=1, nr=4, nc=6
[[         0.396    0.541    0.809    0.935    0.108    0.671
           0.138    0.775    0.485    0.362    0.192    0.219
           0.290    0.232    0.876    0.646    0.400    0.518
           0.622    0.079    0.499    0.193    0.945    0.458
        ]]
test passed: reorg_cpu: max(abs(mat(grad_cpu) - mat(input))) < 1e-5

@arrufat
Copy link
Contributor

arrufat commented Sep 16, 2024

Yes, your proposed fix makes sense, I should've noticed while porting this from Darknet, which only supports "square" strides, not different ones for rows and columns, that's why it was working in the tests, too, because we were only testing reorg<2, 2>

Cydral and others added 5 commits September 20, 2024 19:43
* remove using namespace std from headers

* more std::

* more std::

* more std:: on windows stuff

* remove uses of using namespace std::chrono

* do not use C++17 features

* Add Davis suggestion

* revert some more stuff

* revert removing include

* more std::chrono stuff
@davisking
Copy link
Owner

Sweet, thanks for the bug fix :)

@davisking davisking merged commit 72822fe into davisking:master Sep 23, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants