[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling #125

LeiWang1999 · 2024-08-04T18:39:59Z

This pull request primarily focuses on enhancing the GPU intrinsic functions and updating the workflow configuration. The key changes include adding new decoding functions with scaling and offset capabilities, modifying the workflow configuration, and updating submodule references.

Enhancements to GPU Intrinsic Functions:

Added new decoding functions with scaling and offset capabilities in bitblas/gpu/intrin/lop3.py. These functions include decode_i4_to_f16_scale_offset, decode_i4_to_f16_scale_zeros_original_offset, decode_i4_to_f16_scale_zeros_rescale_offset, and decode_i2_to_f16_scale_zeros_original_offset. [1] [2] [3] [4]
Introduced get_func_arguments helper function to streamline the arguments passed to external functions.
Updated the fast_decode_impl function to use the new helper function and added offset factors for buffers. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Workflow Configuration:

Changed depends-on to needs in the .github/workflows/benchmark.yml file to improve workflow dependencies.

Submodule Update:

Updated the submodule reference for 3rdparty/tvm to a new commit.

…ability and maintainability

…ainability

…tainability

LeiWang1999 added 30 commits July 5, 2024 08:54

Refactor BatchMatMulEmitter and BatchMatMulSelector for improved read…

d8884e6

…ability and maintainability

Refactor import statements for improved readability and maintainability

fc84173

Refactor import statements for improved readability and maintainability

02f64de

disable failure email for ci

397eee6

remove email notifications.

20f6ad1

move relax pass from testing to mlc_llm

b93c394

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into main

ba6a6df

Refactor scripts with se check_eual_ref_scripts_with_emitter function

257693a

Lint Fix

9bb7f49

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into main

39e7614

Refactor scripts with se check_eual_ref_scripts_with_emitter function

93eb5a5

bug fix in test

aa66a90

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

ae14a53

lint fix.

79b08e4

test cuda i4 kernel

86fd036

Refactor copyright notice in i4matmul.hpp

6b73a21

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

0ba90c1

Refactor BitBLASLinear test module for improved readability and maint…

086d208

…ainability

refactor test as version below python 3.9 cannot handle int32 overflow.

47a3abd

format lint for test

024b247

Refactor test_int4b_fp16_convert.py for improved readability and main…

bfedeaa

…tainability

remove unused design file

e672a23

move tile device from package to base

21e5430

dummy impl for codegen

fd11940

Refactor file structure for ladder_permutate module

9ccfa85

Refactor backend class and fix typos in comments

7c7d73e

Deep refactor Lib related code.

47d5fc5

remove ci pull.

53dd0dd

LintFix

d58ac43

refactor builder for whl build

37cb07c

LeiWang1999 added 29 commits July 31, 2024 07:19

Enhance LOP3 Instructions

d1b2bc7

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

2aac6d0

add test for stage3 propagate

802abde

implement propagate func

d339037

Stage3 Ladder Permutate integration

0f6a033

get_ladder_stage3_propagate

00ec916

comments benchmark scirpts as the setting is too big

5316577

ci fix for benchmark

dd070f9

lint fix

6fcc368

chore: Update benchmark workflow to trigger on pull request comments

705580b

Add LDMatrix Transform 3

c5ba940

Support GPTQ Test

1566990

Fuse BlockReduce Schedule

c6c70ef

Support mma propagate 3

36128f3

Support MMA Propagate Stage 3

23ff5f4

Lint Fix

de3bf08

Merge block reduce for dequantze config.

d9830ba

fix codeql

e5a4485

chore: Update submodule reference to latest commit

a04282b

chore: Disable common subexpression elimination in TIR passes

314d3e9

Lint Fix

f7d33bb

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

db633ed

4bit related lop3 updates.

201155a

lint fix

2b73662

gptq test fix

1a6a0fd

Fix for test

e84e3ef

lint fix

f0fbb55

lint fix

bf30688

typofix

9a360ba

LeiWang1999 merged commit fa0f7b1 into microsoft:main Aug 5, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling #125

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling #125

LeiWang1999 commented Aug 4, 2024

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling #125

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling #125

Conversation

LeiWang1999 commented Aug 4, 2024

Enhancements to GPU Intrinsic Functions:

Workflow Configuration:

Submodule Update: