Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ util ] Implement softmax calculation function in util #2479

Merged
merged 5 commits into from
Mar 5, 2024

Conversation

skykongkong8
Copy link
Member

@skykongkong8 skykongkong8 commented Feb 20, 2024

  • Current activation functions are implemented as a function template, and fully computes with the function template parameter's precision unless using NEON intrinsics with inter-fp32-values explicitly.
  • According to current papers, for safe convergence of mixed precision training, it is quite critical to calculate softmax with fp32 precision.
  • This PR proposes a SIMD version of softmax calculation, and uses temporally higher precision when using half-precision
  • For mathematical stability, applied linear translation (using minus values for the input of exponential function) to avoid precision overflow

@taos-ci
Copy link
Collaborator

taos-ci commented Feb 20, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2479. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

float max(const unsigned int N, float *X) {
#ifdef USE_NEON
return nntrainer::neon::max(N, X);
#else
Copy link
Member

@myungjoo myungjoo Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for your future reference.

Use STL properly

std::vector<float> v (X, X+N);
return *std::max_element(v.begin(), v.end());

And if you compile it properly, you may get x64/SIMD (AVX/SSE) for free:

https://stackoverflow.com/questions/59373900/why-is-there-no-simd-functionality-in-the-c-standard-library

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will apply this right away

- Current softmax implementation does not consider fp32 use in half-precision softmax
- Implement raw, and neon-simd version of softmax function with fp32 and fp16 with fp32 accumulation

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
- Unlike isamax function of BLAS, this function returns the maximum 'value', not index
- Note that this function is applicable only when the input data is continuous

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
- For numerical stability, using negative values for the input of exponential function is recommended. (since negative output will range from 0 to 1)
- Subtract the maximum value before calculating exponential vectors

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
- Add exponential inplace function

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
- For cleaner code use std::max_element instead of for-loop

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@baek2sm baek2sm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except minor comments.

unsigned int i = 0;
float sum = 0.f;
float max_x = max(N, X);
while (i < N) {
Copy link
Collaborator

@jijoongmoon jijoongmoon Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might depend on the size of Matrix N, but you can also optimize further using omp and temporal buffer to save X[i] - max_x. You can optimize it later cause we also have to consider not using the NEON case.

@jijoongmoon jijoongmoon merged commit d718ede into nnstreamer:main Mar 5, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants