Skip to content

Hidet v0.4.1

Latest
Compare
Choose a tag to compare
@vadiklyutiy vadiklyutiy released this 30 Jul 02:40
· 1 commit to main since this release
a412bdf

What's Changed

  • [Fix] Fixing an error triggered by the operator any (#369) by Bolin Sun 6a4c2e5
  • [Fix] added torch.t for mobilebert-uncased model (#353) by zhumakhan 95d95a4
  • [CI] Use same image for tests and publishing test execution (#463) by c-fteixeira 49fd332
  • [BUG] fix bug in disallow in graph (#464) by Vadim Gimpelson d84f2c5
  • [CI] Move Publish workflow to internal ARC runners (#461) by c-fteixeira b5d6aaf
  • [CI] Update container for CI (#460) by Vadim Gimpelson b973591
  • [Bug] Rename test_arithmetic.py -> test_arithmetic2.py (#459) by Vadim Gimpelson 6aa6cf8
  • Update requirements-dev.txt to use pytorch version >= 2.3.0 (#458) by Vadim Gimpelson 6b32295
  • [CI] Repeat start_instance (#361) by vadiklyutiy cf5cadd
  • [Operators] Adding leaky_relu support (#360) by Bolin Sun 7401ccc
  • [Fix] Fixing an error triggered while compiling the torch.nn.Upsample module with align_corners=True (#344) by Bolin Sun 2c34cfc
  • [PERF] Remote workaround for loops in add_hints_pass (#356) by vadiklyutiy 3195be5
  • [Operators] Registering tensor methods whose PyTorch function equivalents are supported by Hidet (#347) by Bolin Sun 44ab5ad
  • [PERF] Introduce add_hint_pass (#355) by vadiklyutiy c014dab
  • [CI] Promote nvidia docker container to version 24.4 (#354) by vadiklyutiy cb809b9
  • [Fix] type casting for attention mask from fp32 -> f16 (#323) by zhumakhan 9a10dc0
  • [Fix] Added missing torch.multiply and torch.nn.functional.unfold ops for conv-bert-base model (#351) by zhumakhan 18842ee
  • [Fix] Fixing a bug in register_methods (#331) by Bolin Sun c87c515
  • [Fix] Handling special cases in setitem regarding dtype and device (#332) by Bolin Sun ff9445e
  • [BUG] Fixed search_space bug in bench_op.py (#348) by vadiklyutiy 29e4c0e
  • [OPS] Dissallow in fxgraph not supported functions (#317) by vadiklyutiy 984cf75
  • [OPTIONS] Remove dynamo_config['search_space'] (#342) by vadiklyutiy 0814bd8
  • [Operator] Adding support for torch.Tensor.view_as (#334) by Bolin Sun 5f19dd0
  • [Operators] Adding support for torch.nn.TransformerEncoder (#327) by Bolin Sun d625146
  • [OPTIONS] Inherit options from torch.compile() (#260) by vadiklyutiy 3638a0b
  • [Operator] Adding __ge__ method for the Tensor class (#330) by Bolin Sun ed5feff
  • [Fix] Fixing an error triggered by ClampOp (#329) by Bolin Sun 05984cb
  • [Fix] Handling hidet errors caused by device difference in getitem (#322) by Bolin Sun 5a90820
  • [Fix] Fixing a RuntimeError triggered by tensor_reshape function in register_functions.py (#328) by Bolin Sun 0cd2f83
  • [Operators] Adding PyTorch operators encountered while compiling DALLE2_pytorch (#319) by Bolin Sun ecb99b1
  • [Fix] Fix the bug in tensor_expand caused by attempting to modify immutable_list (#320) by Bolin Sun bb89e22
  • [Chore] replace copyrights with citations (#315) by xiaocenxiaocen 3fba091
  • [Operator] Extending the functionality support for einsum (#312) by Bolin Sun 703e92a
  • Handle dtype and device in hidet.ones_like op (#316) by zhumakhan f031eb3
  • [PERF] Reduce fixed overhead for model run (#310) by vadiklyutiy fadf67d
  • Increase batch size for bert to decrease fluctuations (#236) by vadiklyutiy a8db40c
  • Setitem with tensor values. And Boolean type promotion (#290) by zhumakhan 60e75ca
  • [BUG] when device is None, device_from_torch returns 'cpu' by default. Fixed (#311) by zhumakhan d047440
  • [Graph][Ops] fp32 accumulation for cute matmul (#292) by xiaocenxiaocen a813605
  • [Perf] support vectorized epilogue fusion (#220) by xiaocenxiaocen ddacf36
  • Removing constant tensors that are not needed after subgraph rewrite pass (#252) by zhumakhan db49f68
  • [Fix] Handling Tensor.to(..., device=....) on symbolic tensors (#284) by Bolin Sun 6357880
  • [Operator] torch.any (#287) by zhumakhan 8a42a65
  • [Graph][Ops] fp32 accumulation for matmul_f16 (#268) by xiaocenxiaocen 5bf255a
  • adding support for torch.any (#277) by zhumakhan 2c4c672
  • fix: handles race condition on parallel config directory creation (#285) by c-fteixeira b465dd3
  • [SCRIPTS] Adopt our scripts to use mode from torch.compile (#274) by vadiklyutiy 0f825b3
  • [Fix] Handling getitem special case (#281) by Bolin Sun 564561e
  • [Operator] Added advanced tensor indexing (#251) by zhumakhan 018ca2c
  • [Operator] Adding support to repeat_interleave and more (#270) by Bolin Sun b52bc88
  • [PERF] Increase accuracy of pick up the best candidate (#269) by vadiklyutiy 3834643
  • [Operator] Registering torch.Tensor.copy_ (#259) by Bolin Sun af5c893
  • [OPTIONS] Use Attention by default (#261) by vadiklyutiy 33ad85b
  • [Operator] Registering torch.sigmoid_ (#258) by Bolin Sun c9fb801
  • [Operator] Adding support for torch.Tensor.div (#249) by Bolin Sun c8d4663
  • [Operator] Adding torch.Tensor.expand_as support (#250) by Bolin Sun 923f078
  • [Operator] Adding support to operators torch.Tensor.max and torch.Tensor.new_full (#238) by Bolin Sun c5912a4
  • Delete options use_fp16 and use_fp16_reduction (#239) by vadiklyutiy e7fe23b
  • Inherit mode argument from torch.compile and set corresponding options (#237) by vadiklyutiy 91f666e
  • [Operators] Registering torch.as_tensor (#235) by Bolin Sun 540367b
  • [Operator] Registering torch.Tensor.argmax (#234) by Bolin Sun bdd7acd
  • [Ir][CuTE] lower cute dialect (#109) (#230) by xiaocenxiaocen 783a549
  • Xiaocenxiaocen/expose more ldst instructions (#216) by xiaocenxiaocen 8f03f9e
  • steal_weight option fixes && fixes for mistral model (#209) by zhumakhan 9728c21
  • Fix issues related to mistral model (#213) by zhumakhan 68e801b
  • [BENCHs] Refactor transformers tests. Add llama2, mistral, gemma, gpt2 to script (#210) by vadiklyutiy 59028d8
  • [BUGFIX] Init cuda info before run forks for IR generation (#208) by vadiklyutiy 3012546
  • [Ir] add utilities for CuTe (#107) by xiaocenxiaocen 423e112
  • [BUG] Clear _job_queue in parallel_imap for tests (#204) by vadiklyutiy bf39bd6
  • [OPTIONS] Don't create hidet config if it's not exist (#203) by vadiklyutiy 294d261
  • feat: parallel job execution for tests (#147) by c-fteixeira db588f9
  • __getitem__ with N dimensional index tensor (#185) by zhumakhan f46a184
  • [Fix] Remove YOLOv7 from tests/benchmarks/run_configs.json (#187) by Bolin Sun 5fc4271
  • [Operator] Adding meshgrid operator support (#183) by Bolin Sun d8158a9
  • [Bug] Fix number of groups under certain case (#181) by Max Hu 8a6cbfd
  • [COMPTIME] Reduce the number of fork in multithreading.Pool (#180) by vadiklyutiy 9e576dc
  • [COMPTIME] Add chunksize arg to pool.imap (#178) by vadiklyutiy 7c50af6
  • optimize grouping method (#174) by Max Hu 9b9a22b
  • [App] SyncLLM + AsyncLLM interface (#166) by Jack Lee e51f0c0
  • [Ir][Primitives] add hopper instructions (#83) by xiaocenxiaocen 4225298
  • [OPS] Add torch.Tensor.sin, torch.Tensor.cos and torch._C._nn.pad (#175) by vadiklyutiy 90a6231
  • [App] ResNet Compiled App (2/2) - Pipeline (#165) by Kevin Tong d308f8f
  • Revive dynamic shape support with torch.compile (#162) by vadiklyutiy cf343ab
  • [Models] Gemma implementation (#132) by Jack Lee 3a84820
  • Support Transpose2D (#77) by zhiwei-fang dd2e9d2
  • [App] Cleanup SD Implementation (#143) by Kevin Tong 359763e
  • [Fixbug] Set _is_exiting correctly (#163) by Jack Lee 1c8b31f
  • [App] Fix LLM app tracing (#158) by Jack Lee f618977
  • [Operator] triu + tril operators (#146) by Jack Lee 70894fa
  • Gemma+torch.compile fixes(autocast, rtruediv) (#159) by vadiklyutiy 710ac50
  • [IR] [Primitives] Add thread cluster on sm_90 (#145) by Kevin Tong ccc28d6
  • [App] Minor bugfixes for LLM app (#157) by Jack Lee 179f058
  • [COMPTIME] Specialize Constant._binary() for compilation speedup (#148) by vadiklyutiy 8a1eab4
  • [Operator] Fix symbolic broadcasting (#131) by Jack Lee 1252220
  • [Operator] Register missing math primitives (#134) by Jack Lee 61b0052
  • [Ir][Primitives] fix __shfl_xor_sync (#155) by xiaocenxiaocen 37c75a6
  • [COMPTIME] Parallelize apply_prologue_epilog(fusion) and IR generation(implement*) (#127) by vadiklyutiy 9e96c45
  • [Graph] Enhance forward debug instrument (#130) by Jack Lee 4267686
  • Stable Diffusion App Infra (#103) by Kevin Tong 8f03f9e
  • [LLM App] LLM Application initial support (#121) by Yaoyao Ding fc61f48
  • [Models] Support for tokenizers in C++ runtime (#69) by Jack Lee c14de4e
  • [Graph] Add major UNet building components (#97) by Kevin Tong 364ba9c
  • [CI] Add clang-format script/action (#120) by Jack Lee cdff99a
  • [Graph] Stable Diffusion Rope Module (#95) by Kevin Tong 6fa5803
  • [App] Complete UNet Definition (#99) by Kevin Tong 805620e
  • [FFI] Refactor CompiledFunction interface with ctypes (#79) by Jack Lee a8c9d94
  • [STYLE] Format cpp/h files (#454) by vadiklyutiy 1f1b011
  • [cuDNN] Add cudnn conv2d (#453) by vadiklyutiy bc5a6df

Contributors

Full Changelog: v0.3.1...v0.4.0