-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ Tensor ] Remove CBLAS params from Tensor related files. #2704
[ Tensor ] Remove CBLAS params from Tensor related files. #2704
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2704. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
This pr is related to known issue raised from #2682 |
6b51401
to
ff8acfb
Compare
cibot: @skykongkong8, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2704-202408121359440.67014408111572-ff8acfbbcb564ab957d91ca5670e01165424da05/. |
ff8acfb
to
997beb0
Compare
- Remove cblas params from tensor related files since nntrainer is not fully-dependent on cblas anymore. - Letting tensors to be aware of Cblas related parameters is a nonsense at the first place. - CBLAS params will be declared only when functions from cblas is called. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
997beb0
to
c952f2f
Compare
@s-debadri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! please take a look at the comments :)
@@ -93,8 +99,7 @@ static inline void transpose_fallback(unsigned int M, unsigned int N, | |||
static void saxpy_FP16(const unsigned int N, const float alpha, const _FP16 *X, | |||
const int incX, _FP16 *Y, const int incY) { | |||
if (incX < 0 or incY < 0) | |||
throw std::invalid_argument( | |||
"Error: negative inc not supported without cblas"); | |||
throw std::invalid_argument("Error: negative inc not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q1) is negative increment always not supported?
Q2) what happens when the increment is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incX
and incY
are indices, thus should be always positive. I think this would answer both questions!
cublasOperation_t transB = | ||
(TransB == CblasTrans) ? CUBLAS_OP_T : CUBLAS_OP_N; | ||
cublasOperation_t transA = (TransA) ? CUBLAS_OP_T : CUBLAS_OP_N; | ||
cublasOperation_t transB = (TransB) ? CUBLAS_OP_T : CUBLAS_OP_N; | ||
cublasSgemm(handle, transA, transB, N, M, K, &alpha, d_B, N, d_A, K, &beta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like cuBLAS interprets matrices as column-major. we should preprocess (e.g., transpose) to correctly use cublasSgemm
. For now, let's mark it as ToDo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never knew it 😮 Thanks for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it does take matrices as Col-Maj storage order !
https://stackoverflow.com/questions/56043539/cublassgemm-row-major-multiplication
@@ -493,8 +493,8 @@ void FloatTensor::sum_by_batch(Tensor &output) const { | |||
|
|||
Tensor ones(1, 1, 1, feat_len, this->getFormat()); | |||
ones.setValue(1.0); | |||
sgemv(CblasRowMajor, CblasNoTrans, batch, feat_len, 1, data, feat_len, | |||
ones.getData<float>(), 1, 0.0, out_data, 1); | |||
sgemv((unsigned int)dim.getStorageOrder(), false, batch, feat_len, 1, data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a suggestion! how about having a fixed value for storage orders like transpose?
although there's no difference in the result, I think it would be easier for us to understand the code and debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get it... Could you elaborate a little bit more for me?
I think current implementation is quite similar to transpose cases.
With my understanding of your suggestion, do you mean we should have functions like:
sgemv_rowMaj(...)
...
sgemv_colMah(...)
....
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant by having a fixed value is as follows.
sgemv((unsigned int)dim.getStorageOrder(), false, batch, feat_len, 1, data, | |
sgemv(TStorageOrder::ROW_MAJOR, false, batch, feat_len, 1, data, |
same as we pass the transpose with true/false!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good one!
This PR resolves the build error after nnstreamer#2704 when enable_fp16 is true. This fixes: blas_interface.cpp:141:9: error: ‘order’ was not declared in this scope 141 | sgemv(order, TransA, M, N, alpha, A_, lda, X_, incX, beta, Y_, incY); | ^~~~~ **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [ ]Passed [X]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
This PR resolves the build error after #2704 when enable_fp16 is true. This fixes: blas_interface.cpp:141:9: error: ‘order’ was not declared in this scope 141 | sgemv(order, TransA, M, N, alpha, A_, lda, X_, incX, beta, Y_, incY); | ^~~~~ **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [ ]Passed [X]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
TStorageOrder
: Tensor Storage OrderSelf evaluation: