Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling #125

Merged
merged 134 commits into from
Aug 5, 2024

Conversation

LeiWang1999
Copy link
Contributor

This pull request primarily focuses on enhancing the GPU intrinsic functions and updating the workflow configuration. The key changes include adding new decoding functions with scaling and offset capabilities, modifying the workflow configuration, and updating submodule references.

Enhancements to GPU Intrinsic Functions:

  • Added new decoding functions with scaling and offset capabilities in bitblas/gpu/intrin/lop3.py. These functions include decode_i4_to_f16_scale_offset, decode_i4_to_f16_scale_zeros_original_offset, decode_i4_to_f16_scale_zeros_rescale_offset, and decode_i2_to_f16_scale_zeros_original_offset. [1] [2] [3] [4]
  • Introduced get_func_arguments helper function to streamline the arguments passed to external functions.
  • Updated the fast_decode_impl function to use the new helper function and added offset factors for buffers. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Workflow Configuration:

  • Changed depends-on to needs in the .github/workflows/benchmark.yml file to improve workflow dependencies.

Submodule Update:

  • Updated the submodule reference for 3rdparty/tvm to a new commit.

@LeiWang1999 LeiWang1999 merged commit fa0f7b1 into microsoft:main Aug 5, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant