Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ Tensor ] Remove CBLAS params from Tensor related files. #2704

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

skykongkong8
Copy link
Member

@skykongkong8 skykongkong8 commented Aug 12, 2024

  • Remove cblas params from tensor related files since nntrainer is not fully-dependent on cblas anymore.
  • Letting tensors to be aware of Cblas related parameters is a nonsense at the first place.
  • CBLAS params will be declared only when functions from cblas is called.
  • fyi) TStorageOrder : Tensor Storage Order

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

@taos-ci
Copy link
Collaborator

taos-ci commented Aug 12, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2704. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@skykongkong8
Copy link
Member Author

skykongkong8 commented Aug 12, 2024

This pr is related to known issue raised from #2682

@skykongkong8 skykongkong8 linked an issue Aug 12, 2024 that may be closed by this pull request
@skykongkong8 skykongkong8 force-pushed the pr/tensor/rm_cblas_params branch 2 times, most recently from 6b51401 to ff8acfb Compare August 12, 2024 04:59
@taos-ci
Copy link
Collaborator

taos-ci commented Aug 12, 2024

:octocat: cibot: @skykongkong8, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2704-202408121359440.67014408111572-ff8acfbbcb564ab957d91ca5670e01165424da05/.

- Remove cblas params from tensor related files since nntrainer is not fully-dependent on cblas anymore.
- Letting tensors to be aware of Cblas related parameters is a nonsense at the first place.
- CBLAS params will be declared only when functions from cblas is called.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
@skykongkong8
Copy link
Member Author

@s-debadri
Please do not let cblas params to reside outside of actual cblas function interface.
I could observe some from opencl related works.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! please take a look at the comments :)

@@ -93,8 +99,7 @@ static inline void transpose_fallback(unsigned int M, unsigned int N,
static void saxpy_FP16(const unsigned int N, const float alpha, const _FP16 *X,
const int incX, _FP16 *Y, const int incY) {
if (incX < 0 or incY < 0)
throw std::invalid_argument(
"Error: negative inc not supported without cblas");
throw std::invalid_argument("Error: negative inc not supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q1) is negative increment always not supported?
Q2) what happens when the increment is zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incX and incY are indices, thus should be always positive. I think this would answer both questions!

cublasOperation_t transB =
(TransB == CblasTrans) ? CUBLAS_OP_T : CUBLAS_OP_N;
cublasOperation_t transA = (TransA) ? CUBLAS_OP_T : CUBLAS_OP_N;
cublasOperation_t transB = (TransB) ? CUBLAS_OP_T : CUBLAS_OP_N;
cublasSgemm(handle, transA, transB, N, M, K, &alpha, d_B, N, d_A, K, &beta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like cuBLAS interprets matrices as column-major. we should preprocess (e.g., transpose) to correctly use cublasSgemm. For now, let's mark it as ToDo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never knew it 😮 Thanks for pointing this out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it does take matrices as Col-Maj storage order !
https://stackoverflow.com/questions/56043539/cublassgemm-row-major-multiplication

@@ -493,8 +493,8 @@ void FloatTensor::sum_by_batch(Tensor &output) const {

Tensor ones(1, 1, 1, feat_len, this->getFormat());
ones.setValue(1.0);
sgemv(CblasRowMajor, CblasNoTrans, batch, feat_len, 1, data, feat_len,
ones.getData<float>(), 1, 0.0, out_data, 1);
sgemv((unsigned int)dim.getStorageOrder(), false, batch, feat_len, 1, data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a suggestion! how about having a fixed value for storage orders like transpose?
although there's no difference in the result, I think it would be easier for us to understand the code and debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get it... Could you elaborate a little bit more for me?
I think current implementation is quite similar to transpose cases.
With my understanding of your suggestion, do you mean we should have functions like:

sgemv_rowMaj(...)
...
sgemv_colMah(...)
....

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant by having a fixed value is as follows.

Suggested change
sgemv((unsigned int)dim.getStorageOrder(), false, batch, feat_len, 1, data,
sgemv(TStorageOrder::ROW_MAJOR, false, batch, feat_len, 1, data,

same as we pass the transpose with true/false!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good one!

@jijoongmoon jijoongmoon merged commit 6623e30 into nnstreamer:main Aug 22, 2024
48 checks passed
djeong20 added a commit to djeong20/nntrainer that referenced this pull request Aug 23, 2024
This PR resolves the build error after nnstreamer#2704 when enable_fp16 is true.

This fixes:
blas_interface.cpp:141:9: error: ‘order’ was not declared in this scope
  141 |   sgemv(order, TransA, M, N, alpha, A_, lda, X_, incX, beta, Y_, incY);
      |         ^~~~~

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [ ]Passed [X]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
myungjoo pushed a commit that referenced this pull request Aug 24, 2024
This PR resolves the build error after #2704 when enable_fp16 is true.

This fixes:
blas_interface.cpp:141:9: error: ‘order’ was not declared in this scope
  141 |   sgemv(order, TransA, M, N, alpha, A_, lda, X_, incX, beta, Y_, incY);
      |         ^~~~~

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [ ]Passed [X]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
@skykongkong8 skykongkong8 deleted the pr/tensor/rm_cblas_params branch October 2, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Cblas params from tensor related files
6 participants