Release Hidet v0.4.1 · hidet-org/hidet

What's Changed

[Fix] Fixing an error triggered by the operator any (#369) by Bolin Sun 6a4c2e5
[Fix] added torch.t for mobilebert-uncased model (#353) by zhumakhan 95d95a4
[CI] Use same image for tests and publishing test execution (#463) by c-fteixeira 49fd332
[BUG] fix bug in disallow in graph (#464) by Vadim Gimpelson d84f2c5
[CI] Move Publish workflow to internal ARC runners (#461) by c-fteixeira b5d6aaf
[CI] Update container for CI (#460) by Vadim Gimpelson b973591
[Bug] Rename test_arithmetic.py -> test_arithmetic2.py (#459) by Vadim Gimpelson 6aa6cf8
Update requirements-dev.txt to use pytorch version >= 2.3.0 (#458) by Vadim Gimpelson 6b32295
[CI] Repeat start_instance (#361) by vadiklyutiy cf5cadd
[Operators] Adding leaky_relu support (#360) by Bolin Sun 7401ccc
[Fix] Fixing an error triggered while compiling the torch.nn.Upsample module with align_corners=True (#344) by Bolin Sun 2c34cfc
[PERF] Remote workaround for loops in add_hints_pass (#356) by vadiklyutiy 3195be5
[Operators] Registering tensor methods whose PyTorch function equivalents are supported by Hidet (#347) by Bolin Sun 44ab5ad
[PERF] Introduce add_hint_pass (#355) by vadiklyutiy c014dab
[CI] Promote nvidia docker container to version 24.4 (#354) by vadiklyutiy cb809b9
[Fix] type casting for attention mask from fp32 -> f16 (#323) by zhumakhan 9a10dc0
[Fix] Added missing torch.multiply and torch.nn.functional.unfold ops for conv-bert-base model (#351) by zhumakhan 18842ee
[Fix] Fixing a bug in register_methods (#331) by Bolin Sun c87c515
[Fix] Handling special cases in setitem regarding dtype and device (#332) by Bolin Sun ff9445e
[BUG] Fixed search_space bug in bench_op.py (#348) by vadiklyutiy 29e4c0e
[OPS] Dissallow in fxgraph not supported functions (#317) by vadiklyutiy 984cf75
[OPTIONS] Remove dynamo_config['search_space'] (#342) by vadiklyutiy 0814bd8
[Operator] Adding support for torch.Tensor.view_as (#334) by Bolin Sun 5f19dd0
[Operators] Adding support for torch.nn.TransformerEncoder (#327) by Bolin Sun d625146
[OPTIONS] Inherit options from torch.compile() (#260) by vadiklyutiy 3638a0b
[Operator] Adding __ge__ method for the Tensor class (#330) by Bolin Sun ed5feff
[Fix] Fixing an error triggered by ClampOp (#329) by Bolin Sun 05984cb
[Fix] Handling hidet errors caused by device difference in getitem (#322) by Bolin Sun 5a90820
[Fix] Fixing a RuntimeError triggered by tensor_reshape function in register_functions.py (#328) by Bolin Sun 0cd2f83
[Operators] Adding PyTorch operators encountered while compiling DALLE2_pytorch (#319) by Bolin Sun ecb99b1
[Fix] Fix the bug in tensor_expand caused by attempting to modify immutable_list (#320) by Bolin Sun bb89e22
[Chore] replace copyrights with citations (#315) by xiaocenxiaocen 3fba091
[Operator] Extending the functionality support for einsum (#312) by Bolin Sun 703e92a
Handle dtype and device in hidet.ones_like op (#316) by zhumakhan f031eb3
[PERF] Reduce fixed overhead for model run (#310) by vadiklyutiy fadf67d
Increase batch size for bert to decrease fluctuations (#236) by vadiklyutiy a8db40c
Setitem with tensor values. And Boolean type promotion (#290) by zhumakhan 60e75ca
[BUG] when device is None, device_from_torch returns 'cpu' by default. Fixed (#311) by zhumakhan d047440
[Graph][Ops] fp32 accumulation for cute matmul (#292) by xiaocenxiaocen a813605
[Perf] support vectorized epilogue fusion (#220) by xiaocenxiaocen ddacf36
Removing constant tensors that are not needed after subgraph rewrite pass (#252) by zhumakhan db49f68
[Fix] Handling Tensor.to(..., device=....) on symbolic tensors (#284) by Bolin Sun 6357880
[Operator] torch.any (#287) by zhumakhan 8a42a65
[Graph][Ops] fp32 accumulation for matmul_f16 (#268) by xiaocenxiaocen 5bf255a
adding support for torch.any (#277) by zhumakhan 2c4c672
fix: handles race condition on parallel config directory creation (#285) by c-fteixeira b465dd3
[SCRIPTS] Adopt our scripts to use mode from torch.compile (#274) by vadiklyutiy 0f825b3
[Fix] Handling getitem special case (#281) by Bolin Sun 564561e
[Operator] Added advanced tensor indexing (#251) by zhumakhan 018ca2c
[Operator] Adding support to repeat_interleave and more (#270) by Bolin Sun b52bc88
[PERF] Increase accuracy of pick up the best candidate (#269) by vadiklyutiy 3834643
[Operator] Registering torch.Tensor.copy_ (#259) by Bolin Sun af5c893
[OPTIONS] Use Attention by default (#261) by vadiklyutiy 33ad85b
[Operator] Registering torch.sigmoid_ (#258) by Bolin Sun c9fb801
[Operator] Adding support for torch.Tensor.div (#249) by Bolin Sun c8d4663
[Operator] Adding torch.Tensor.expand_as support (#250) by Bolin Sun 923f078
[Operator] Adding support to operators torch.Tensor.max and torch.Tensor.new_full (#238) by Bolin Sun c5912a4
Delete options use_fp16 and use_fp16_reduction (#239) by vadiklyutiy e7fe23b
Inherit mode argument from torch.compile and set corresponding options (#237) by vadiklyutiy 91f666e
[Operators] Registering torch.as_tensor (#235) by Bolin Sun 540367b
[Operator] Registering torch.Tensor.argmax (#234) by Bolin Sun bdd7acd
[Ir][CuTE] lower cute dialect (#109) (#230) by xiaocenxiaocen 783a549
Xiaocenxiaocen/expose more ldst instructions (#216) by xiaocenxiaocen 8f03f9e
steal_weight option fixes && fixes for mistral model (#209) by zhumakhan 9728c21
Fix issues related to mistral model (#213) by zhumakhan 68e801b
[BENCHs] Refactor transformers tests. Add llama2, mistral, gemma, gpt2 to script (#210) by vadiklyutiy 59028d8
[BUGFIX] Init cuda info before run forks for IR generation (#208) by vadiklyutiy 3012546
[Ir] add utilities for CuTe (#107) by xiaocenxiaocen 423e112
[BUG] Clear _job_queue in parallel_imap for tests (#204) by vadiklyutiy bf39bd6
[OPTIONS] Don't create hidet config if it's not exist (#203) by vadiklyutiy 294d261
feat: parallel job execution for tests (#147) by c-fteixeira db588f9
__getitem__ with N dimensional index tensor (#185) by zhumakhan f46a184
[Fix] Remove YOLOv7 from tests/benchmarks/run_configs.json (#187) by Bolin Sun 5fc4271
[Operator] Adding meshgrid operator support (#183) by Bolin Sun d8158a9
[Bug] Fix number of groups under certain case (#181) by Max Hu 8a6cbfd
[COMPTIME] Reduce the number of fork in multithreading.Pool (#180) by vadiklyutiy 9e576dc
[COMPTIME] Add chunksize arg to pool.imap (#178) by vadiklyutiy 7c50af6
optimize grouping method (#174) by Max Hu 9b9a22b
[App] SyncLLM + AsyncLLM interface (#166) by Jack Lee e51f0c0
[Ir][Primitives] add hopper instructions (#83) by xiaocenxiaocen 4225298
[OPS] Add torch.Tensor.sin, torch.Tensor.cos and torch._C._nn.pad (#175) by vadiklyutiy 90a6231
[App] ResNet Compiled App (2/2) - Pipeline (#165) by Kevin Tong d308f8f
Revive dynamic shape support with torch.compile (#162) by vadiklyutiy cf343ab
[Models] Gemma implementation (#132) by Jack Lee 3a84820
Support Transpose2D (#77) by zhiwei-fang dd2e9d2
[App] Cleanup SD Implementation (#143) by Kevin Tong 359763e
[Fixbug] Set _is_exiting correctly (#163) by Jack Lee 1c8b31f
[App] Fix LLM app tracing (#158) by Jack Lee f618977
[Operator] triu + tril operators (#146) by Jack Lee 70894fa
Gemma+torch.compile fixes(autocast, rtruediv) (#159) by vadiklyutiy 710ac50
[IR] [Primitives] Add thread cluster on sm_90 (#145) by Kevin Tong ccc28d6
[App] Minor bugfixes for LLM app (#157) by Jack Lee 179f058
[COMPTIME] Specialize Constant._binary() for compilation speedup (#148) by vadiklyutiy 8a1eab4
[Operator] Fix symbolic broadcasting (#131) by Jack Lee 1252220
[Operator] Register missing math primitives (#134) by Jack Lee 61b0052
[Ir][Primitives] fix __shfl_xor_sync (#155) by xiaocenxiaocen 37c75a6
[COMPTIME] Parallelize apply_prologue_epilog(fusion) and IR generation(implement*) (#127) by vadiklyutiy 9e96c45
[Graph] Enhance forward debug instrument (#130) by Jack Lee 4267686
Stable Diffusion App Infra (#103) by Kevin Tong 8f03f9e
[LLM App] LLM Application initial support (#121) by Yaoyao Ding fc61f48
[Models] Support for tokenizers in C++ runtime (#69) by Jack Lee c14de4e
[Graph] Add major UNet building components (#97) by Kevin Tong 364ba9c
[CI] Add clang-format script/action (#120) by Jack Lee cdff99a
[Graph] Stable Diffusion Rope Module (#95) by Kevin Tong 6fa5803
[App] Complete UNet Definition (#99) by Kevin Tong 805620e
[FFI] Refactor CompiledFunction interface with ctypes (#79) by Jack Lee a8c9d94
[STYLE] Format cpp/h files (#454) by vadiklyutiy 1f1b011
[cuDNN] Add cudnn conv2d (#453) by vadiklyutiy bc5a6df

Contributors

Full Changelog: v0.3.1...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hidet v0.4.1

What's Changed

Contributors

Contributors