Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optim - added quantization code. #968

Open
wants to merge 70 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
236b696
Add inference benchmark, delete extra files
gtamer2 Dec 13, 2023
ed186f5
more deletions
gtamer2 Dec 13, 2023
64ab5a8
Merge pull request #1 from gtamer2/add_benchmark
gtamer2 Dec 13, 2023
62f242d
Changes
gtamer2 Dec 13, 2023
257962f
import fire
gtamer2 Dec 13, 2023
a89488a
git ignore
gtamer2 Dec 13, 2023
5a84145
fixes
gtamer2 Dec 13, 2023
cf2b652
Push changes
gtamer2 Dec 13, 2023
47400da
comment out omdel.to
gtamer2 Dec 13, 2023
4c4d439
print x for debugging
gtamer2 Dec 13, 2023
2071539
inference_benchmamrk
gtamer2 Dec 13, 2023
58eee14
inference_benchmamrk
gtamer2 Dec 13, 2023
86ca1c7
Try large batch
gtamer2 Dec 13, 2023
f146895
Get workign for 1 batch
gtamer2 Dec 13, 2023
fd8a6cf
Get workign for 1 batch
gtamer2 Dec 13, 2023
04cb363
Add torch profiler
gtamer2 Dec 14, 2023
40abad2
Indent
gtamer2 Dec 14, 2023
d6336fc
Indent
gtamer2 Dec 14, 2023
f80cf22
Profiel cpu and cuda
gtamer2 Dec 14, 2023
8436d96
move profiler down
gtamer2 Dec 14, 2023
60bc868
Try using simplified llama
gtamer2 Dec 14, 2023
84416a4
revert
gtamer2 Dec 14, 2023
7dfe9e9
Move outside
gtamer2 Dec 14, 2023
e5f0408
one more yolo
gtamer2 Dec 14, 2023
d7da6e1
rerecord mem
gtamer2 Dec 14, 2023
b988f0e
Merge pull request #2 from gtamer2/benchmark2
gtamer2 Dec 14, 2023
431ebab
added torch.jit.script to model in generation.py
Dec 14, 2023
67d5a14
Script to run benchmarks
gtamer2 Dec 14, 2023
dcda8a1
Fix missing param
gtamer2 Dec 14, 2023
816b895
tried changing .jit.script to be around the llama object call in infe…
Dec 14, 2023
078aa9c
Empty cuda cache
gtamer2 Dec 14, 2023
47aaa29
changed to trace from script
Dec 14, 2023
e8f2d14
.trace on generation.py
Dec 14, 2023
c18ee45
get rid of trace entirely to see if it is causing issue
Dec 14, 2023
a79d31a
Add torchscript python script
gtamer2 Dec 14, 2023
680d2a6
reset
gtamer2 Dec 14, 2023
aeab877
remove bard hallucination
gtamer2 Dec 14, 2023
93d16e2
Merge pull request #3 from gtamer2/new_benchmarks
gtamer2 Dec 14, 2023
10188e4
Merge branch 'main' of https://github.com/gtamer2/hpml_llama into tor…
gtamer2 Dec 14, 2023
c588e99
revisions
gtamer2 Dec 14, 2023
714dd19
revisions
gtamer2 Dec 14, 2023
0abe652
revisions
gtamer2 Dec 14, 2023
5534bbf
Add benchmarks
gtamer2 Dec 14, 2023
44feb11
get quantization working
Dec 14, 2023
ae535a2
Merge pull request #5 from gtamer2/torchscript
gtamer2 Dec 14, 2023
8e34266
add some more quantization lines
Dec 14, 2023
94c70fb
got rid of fuse_model()
Dec 14, 2023
3920ee5
adding convert to quantization model
Dec 14, 2023
adb0190
quantize script
gtamer2 Dec 14, 2023
d71ffa0
fire launch
gtamer2 Dec 15, 2023
748bc32
Inplace
gtamer2 Dec 15, 2023
775475a
access the transformer
gtamer2 Dec 15, 2023
165cd54
New quant logic
gtamer2 Dec 15, 2023
005d8c5
rge branch 'main' of https://github.com/gtamer2/hpml_llama into optim
gtamer2 Dec 15, 2023
c995e43
fix quant sample inf
gtamer2 Dec 15, 2023
575c197
Fix args
gtamer2 Dec 15, 2023
787efd5
move h=quant(h) to after firist layer. It was operating on tokens whi…
gtamer2 Dec 15, 2023
d0b8b39
added a sample for pruning model
Dec 15, 2023
fab7b44
maybe it prunes now?
Dec 15, 2023
1a85f36
added attention
Dec 15, 2023
0bf73c4
add torch.nn.parameter to this
Dec 15, 2023
ac4fb51
added some changes for including torch.nn.Parameter into the pruning,…
Dec 15, 2023
caf7482
get rid of quantization modifications in model.py
Dec 15, 2023
b790b37
checking sparsity
Dec 15, 2023
1c13854
trying to find sparsity again
Dec 15, 2023
bf0ebe0
we got rid of sparsity
Dec 15, 2023
cb43151
giving sparsity another try
Dec 15, 2023
6779933
trying newer things with prune_model.py
Dec 15, 2023
660e231
got rid of print statements, left them until after it was done
Dec 15, 2023
651f8fb
trying to make sure layer is being modified in place
Dec 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 0 additions & 38 deletions .github/ISSUE_TEMPLATE/bug_report.md

This file was deleted.

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
llama-2-7b/
tokenizer.model
tokenizer_checklist.chk

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
80 changes: 0 additions & 80 deletions CODE_OF_CONDUCT.md

This file was deleted.

31 changes: 0 additions & 31 deletions CONTRIBUTING.md

This file was deleted.

126 changes: 0 additions & 126 deletions LICENSE

This file was deleted.

Loading