{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":235860204,"defaultBranch":"master","name":"DeepSpeed","ownerLogin":"microsoft","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2020-01-23T18:35:18.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/6154722?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1727465378.0","currentOid":""},"activityList":{"items":[{"before":"cd98914e10be6f0bbe8bc0b4e1b53b18999734f9","after":"6daa6e2088ca0164a6d584b4e3d9c46bcff5d2f5","ref":"refs/heads/loadams/transformers-fixes","pushedAt":"2024-09-27T20:45:17.000Z","pushType":"push","commitsCount":32,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Merge branch 'master' into loadams/transformers-fixes","shortMessageHtmlLink":"Merge branch 'master' into loadams/transformers-fixes"}},{"before":"828ddfbbda2482412fffc89f5fcd3b0d0eba9a62","after":"8cded575a94e296fee751072e862304676c95316","ref":"refs/heads/master","pushedAt":"2024-09-27T20:32:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Fix torch include in `op_builder/mlu/fused_adam.py` and update no-torch workflow triggers (#6584)\n\nChanges from #6472 caused the no-torch workflow that is an example of\r\nhow we build the DeepSpeed release package to fail (so we caught this\r\nbefore a release, see more in #6402). These changes also copy the style\r\nused to include torch in other accelerator op_builder implementations,\r\nsuch as npu\r\n[here](https://github.com/microsoft/DeepSpeed/blob/master/op_builder/npu/fused_adam.py#L8)\r\nand hpu\r\n[here](https://github.com/microsoft/DeepSpeed/blob/828ddfbbda2482412fffc89f5fcd3b0d0eba9a62/op_builder/hpu/fused_adam.py#L15).\r\n\r\nThis also updates the no-torch workflow to run on all changes to the\r\nop_builder directory. The test runs quickly and shouldn't add any\r\nadditional testing burden there.\r\n\r\nResolves: #6576","shortMessageHtmlLink":"Fix torch include in <code>op_builder/mlu/fused_adam.py</code> and update no-tor…"}},{"before":"338e4e0d55c18a047f37494da314fe3ace9fcdea","after":"7102193083f270a522b517ec492b70cc6f5fd933","ref":"refs/heads/tohtana/get_offload_state_api","pushedAt":"2024-09-27T20:04:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"fix document","shortMessageHtmlLink":"fix document"}},{"before":"5ecc2c9f0adda0e0b8cddd4a58babbba231253dc","after":"338e4e0d55c18a047f37494da314fe3ace9fcdea","ref":"refs/heads/tohtana/get_offload_state_api","pushedAt":"2024-09-27T20:01:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"fix import and add license","shortMessageHtmlLink":"fix import and add license"}},{"before":null,"after":"5ecc2c9f0adda0e0b8cddd4a58babbba231253dc","ref":"refs/heads/tohtana/get_offload_state_api","pushedAt":"2024-09-27T19:29:38.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"Merge branch 'master'","shortMessageHtmlLink":"Merge branch 'master'"}},{"before":"1d638cd5538bff8ad95db0d69c928ce51bef9ed8","after":"1602ad02e75ca8334ce42897f6aa2ca81efe8a33","ref":"refs/heads/jomayeri/lr-step-move","pushedAt":"2024-09-27T17:51:30.000Z","pushType":"push","commitsCount":6,"pusher":{"login":"jomayeri","name":"Joe Mayer","path":"/jomayeri","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114769929?s=80&v=4"},"commit":{"message":"Merge branch 'master' into jomayeri/lr-step-move","shortMessageHtmlLink":"Merge branch 'master' into jomayeri/lr-step-move"}},{"before":"3c5089f432423f09e5ce378f17f75ba0463b3420","after":"45fea6e37ada420529f19c51d9f403b2fb5acd24","ref":"refs/heads/tohtana/clean_up_prefetch_param","pushedAt":"2024-09-27T17:29:31.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"Merge branch 'master' into tohtana/clean_up_prefetch_param","shortMessageHtmlLink":"Merge branch 'master' into tohtana/clean_up_prefetch_param"}},{"before":"f57b2aee32e5f023f77be5e78db3676a0704a8c2","after":"a257e50a3ac18136ed029d1109d643d5ba3778d8","ref":"refs/heads/loadams/fix-triggers-no-torch-workflow","pushedAt":"2024-09-27T16:58:29.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Merge branch 'loadams/fix-no-torch-failure-mlu' into loadams/fix-triggers-no-torch-workflow","shortMessageHtmlLink":"Merge branch 'loadams/fix-no-torch-failure-mlu' into loadams/fix-trig…"}},{"before":"5ea5ab08d3f7913e00fc102a7a7e01bd7d3ec4d2","after":"54ce016dbc500dfa624c6e69e7eaefd1e33f50dc","ref":"refs/heads/loadams/fix-no-torch-failure-mlu","pushedAt":"2024-09-27T16:57:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Fix syntax","shortMessageHtmlLink":"Fix syntax"}},{"before":"1891fedd49c647d8e2379422d5b790a45bcd0951","after":"5ea5ab08d3f7913e00fc102a7a7e01bd7d3ec4d2","ref":"refs/heads/loadams/fix-no-torch-failure-mlu","pushedAt":"2024-09-27T16:56:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Formatting/syntax error","shortMessageHtmlLink":"Formatting/syntax error"}},{"before":null,"after":"f57b2aee32e5f023f77be5e78db3676a0704a8c2","ref":"refs/heads/loadams/fix-triggers-no-torch-workflow","pushedAt":"2024-09-27T16:47:20.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Update triggers for no-torch test","shortMessageHtmlLink":"Update triggers for no-torch test"}},{"before":null,"after":"1891fedd49c647d8e2379422d5b790a45bcd0951","ref":"refs/heads/loadams/fix-no-torch-failure-mlu","pushedAt":"2024-09-27T16:45:07.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Fix MLU op builder torch install","shortMessageHtmlLink":"Fix MLU op builder torch install"}},{"before":"0b1e54a163e2d2cc87d4aac872ee65ce8d62a3e7","after":null,"ref":"refs/heads/loadams/no-torch-workflow","pushedAt":"2024-09-27T16:41:47.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"}},{"before":"3b05a9cf315d4c74ae5bff2725bb293a800894dc","after":null,"ref":"refs/heads/loadams/accelerate-fixes","pushedAt":"2024-09-27T16:22:16.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"}},{"before":"d4e189507659aca7970185d33b84115fbb11b490","after":"828ddfbbda2482412fffc89f5fcd3b0d0eba9a62","ref":"refs/heads/master","pushedAt":"2024-09-27T16:22:13.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Fixes on the accelerate side mean we do not need to skip this test (#6583)\n\nHF accelerate implemented fixes here:\r\nhttps://github.com/huggingface/accelerate/pull/3131\r\n\r\nThis means we can revert the changes from #6574","shortMessageHtmlLink":"Fixes on the accelerate side mean we do not need to skip this test (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2553179540\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6583\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6583/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6583\">#…</a>"}},{"before":"d4e189507659aca7970185d33b84115fbb11b490","after":null,"ref":"refs/heads/gh-readonly-queue/master/pr-6570-1caf6e8107689f5ea9611ac2d6bbbf3a3e6e9731","pushedAt":"2024-09-27T16:03:37.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"}},{"before":"1caf6e8107689f5ea9611ac2d6bbbf3a3e6e9731","after":null,"ref":"refs/heads/gh-readonly-queue/master/pr-6528-047bcf6af6a3721cfac31a13a1ab07c6b5482fb9","pushedAt":"2024-09-27T16:03:37.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"}},{"before":"047bcf6af6a3721cfac31a13a1ab07c6b5482fb9","after":"d4e189507659aca7970185d33b84115fbb11b490","ref":"refs/heads/master","pushedAt":"2024-09-27T16:03:35.000Z","pushType":"merge_queue_merge","commitsCount":2,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"},"commit":{"message":"[COMPILE] workflow for deepspeed + torch.compile (#6570)\n\nWe use simple model + deepspeed zero 3 + torch.compile and count graph\nbreak numbers to demonstrate current status of combing deepspeed +\ntorch.compile.\n\n---------\n\nCo-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>","shortMessageHtmlLink":"[COMPILE] workflow for deepspeed + torch.compile (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2547038002\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6570\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6570/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6570\">#6570</a>)"}},{"before":null,"after":"3b05a9cf315d4c74ae5bff2725bb293a800894dc","ref":"refs/heads/loadams/accelerate-fixes","pushedAt":"2024-09-27T15:08:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"loadams","name":"Logan Adams","path":"/loadams","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114770087?s=80&v=4"},"commit":{"message":"Fixes on the accelerate side mean we do not need to skip this test","shortMessageHtmlLink":"Fixes on the accelerate side mean we do not need to skip this test"}},{"before":"047bcf6af6a3721cfac31a13a1ab07c6b5482fb9","after":null,"ref":"refs/heads/gh-readonly-queue/master/pr-6011-d45cfd34551537ce6f8317504bd520d7a2a1a588","pushedAt":"2024-09-27T12:59:08.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"}},{"before":"d45cfd34551537ce6f8317504bd520d7a2a1a588","after":"047bcf6af6a3721cfac31a13a1ab07c6b5482fb9","ref":"refs/heads/master","pushedAt":"2024-09-27T12:59:06.000Z","pushType":"merge_queue_merge","commitsCount":1,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"},"commit":{"message":"Add APIs to offload states of model, optimizer, and engine (#6011)\n\nThis PR adds the following APIs to offload model, optimizer, and engine\nstates.\n\n```pytyon\ndef offload_states(self,\n                   include: Container[OffloadStateTypeEnum] = None,\n                   device: OffloadDeviceEnum = OffloadDeviceEnum.cpu,\n                   pin_memory: bool = True,\n                   non_blocking: bool = False) -> None:\n    \"\"\"Move the ZeRO optimizer buffers to the specified device.\n\n    Arguments:\n        include: Optional. The set of states to offload. If not provided, all states are offloaded.\n        device: Optional. The device to move the ZeRO optimizer buffers to.\n        pin_memory: Optional. Whether to pin the memory of the offloaded states.\n        non_blocking: Optional. Whether to offload the states asynchronously.\n...\ndef offload_states_back(self, non_blocking: bool = False) -> None:\n```\n\nHere is the typical usage.\n```python\n# Offload after forward, backward, and step\nmodel.offload_states()\n# Do something requiring a lot of device memory\n...\n# Load states back to device memory\nmodel.offload_states_back()\n```\n\nYou can selectively offload states to balance the offloading overhead\nand memory saving.\n```python\nmodel.offload_states(include=set([OffloadStateTypeEnum.hp_params, OffloadStateTypeEnum.opt_states], device=OffloadDeviceEnum.cpu)\n```\n\nPerformance (4.3B parameters / 4x A100)\n- Environment (4x A100, [benchmark\nscript](https://gist.github.com/tohtana/05d5faba5068cf839abfc7b1e38b85e4))\n- Average Device to Host transfer time: 2.45 GB/s, aggregated: 9.79 GB/s\n  - Average Host to Device transfer: 11.05 GB/s, aggregated: 44.19 GB/s\n- Mem (allocated by PyTorch)\n  - Before offload 18.2GB\n  - After offloading 17.7MB\n- Time ([benchmark\nscript](https://github.com/microsoft/DeepSpeedExamples/tree/tohtana/offload_states/training/offload_states),\noffloading time/loading time)\n\npython output_table.py \n| |pin_memory=0 non_blocking=0|pin_memory=0 non_blocking=1|pin_memory=1\nnon_blocking=0|pin_memory=1 non_blocking=1|\n\n|--:|---------------------------|---------------------------|---------------------------|---------------------------|\n| 1|4.34 / 3.42 |4.99 / 2.37 |6.5 / 2.42 |6.0 / 2.39 |\n| 2|9.9 / 3.28 |5.1 / 2.34 |6.21 / 2.42 |6.25 / 2.45 |\n| 3|9.92 / 3.19 |6.71 / 2.35 |6.33 / 2.38 |5.93 / 2.42 |\n| 4|9.55 / 2.82 |7.11 / 2.39 |6.9 / 2.38 |6.5 / 2.43 |\n| 5|4.4 / 3.35 |6.04 / 2.41 |6.26 / 2.41 |6.32 / 2.47 |\n| 6|4.4 / 3.57 |6.58 / 2.42 |6.88 / 2.4 |6.35 / 2.43 |\n| 7|9.51 / 3.12 |6.9 / 2.39 |6.9 / 2.39 |6.46 / 2.4 |\n| 8|4.77 / 3.64 |6.69 / 2.39 |7.39 / 2.42 |6.56 / 2.46 |\n| 9|9.5 / 3.07 |7.18 / 2.42 |6.67 / 2.39 |7.38 / 2.46 |\n\nTODO:\n- Enable offloading to a NVMe storage -> NVMe support is non-trivial. I\nsuggest adding the support in another PR\n- [DONE] Discard buffer (and recreate it) instead of offloading. We\ndon't need to restore the contiguous buffer for reduce.\n- [DONE] Check pin_memory improves performance or not\n\n---------\n\nCo-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>\nCo-authored-by: Olatunji Ruwase <olruwase@microsoft.com>","shortMessageHtmlLink":"Add APIs to offload states of model, optimizer, and engine (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2470766365\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6011\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6011/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6011\">#6011</a>)"}},{"before":"d45cfd34551537ce6f8317504bd520d7a2a1a588","after":"d094b3a251ce32c7e8b7e8a91566a8a03271366b","ref":"refs/heads/tohtana/debug_semaphore_leak","pushedAt":"2024-09-27T07:17:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"add logging of multiprocessing","shortMessageHtmlLink":"add logging of multiprocessing"}},{"before":null,"after":"d45cfd34551537ce6f8317504bd520d7a2a1a588","ref":"refs/heads/tohtana/debug_semaphore_leak","pushedAt":"2024-09-27T07:11:24.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"[XPU] Support DeepNVMe new code structure (#6532)\n\nIn DeepNVMe GDS update, many functions are changed into a more abstract\nway. Also added some files. These change break zero-infinity on XPU. To\nbring this feature back, we have this PR:\n1. modify the aio opbuilder for new files.\n2. Add custom cpu_op_desc_t for xpu users. (XPU don't handle buffer\naligned here)\n\n---------\n\nCo-authored-by: Olatunji Ruwase <olruwase@microsoft.com>\nCo-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>","shortMessageHtmlLink":"[XPU] Support DeepNVMe new code structure (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2524326605\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6532\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6532/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6532\">#6532</a>)"}},{"before":null,"after":"d4e189507659aca7970185d33b84115fbb11b490","ref":"refs/heads/gh-readonly-queue/master/pr-6570-1caf6e8107689f5ea9611ac2d6bbbf3a3e6e9731","pushedAt":"2024-09-27T06:45:57.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"},"commit":{"message":"[COMPILE] workflow for deepspeed + torch.compile (#6570)\n\nWe use simple model + deepspeed zero 3 + torch.compile and count graph\nbreak numbers to demonstrate current status of combing deepspeed +\ntorch.compile.\n\n---------\n\nCo-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>","shortMessageHtmlLink":"[COMPILE] workflow for deepspeed + torch.compile (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2547038002\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6570\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6570/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6570\">#6570</a>)"}},{"before":null,"after":"1caf6e8107689f5ea9611ac2d6bbbf3a3e6e9731","ref":"refs/heads/gh-readonly-queue/master/pr-6528-047bcf6af6a3721cfac31a13a1ab07c6b5482fb9","pushedAt":"2024-09-27T06:11:22.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"},"commit":{"message":"add bfloat16 to inference support dtypes (#6528)\n\nto allow running inference tasks using bfloat16\n\n---------\n\nCo-authored-by: Olatunji Ruwase <olruwase@microsoft.com>\nCo-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>\nCo-authored-by: Logan Adams <loadams@microsoft.com>","shortMessageHtmlLink":"add bfloat16 to inference support dtypes (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2521424856\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6528\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6528/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6528\">#6528</a>)"}},{"before":null,"after":"047bcf6af6a3721cfac31a13a1ab07c6b5482fb9","ref":"refs/heads/gh-readonly-queue/master/pr-6011-d45cfd34551537ce6f8317504bd520d7a2a1a588","pushedAt":"2024-09-27T05:37:47.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"},"commit":{"message":"Add APIs to offload states of model, optimizer, and engine (#6011)\n\nThis PR adds the following APIs to offload model, optimizer, and engine\nstates.\n\n```pytyon\ndef offload_states(self,\n                   include: Container[OffloadStateTypeEnum] = None,\n                   device: OffloadDeviceEnum = OffloadDeviceEnum.cpu,\n                   pin_memory: bool = True,\n                   non_blocking: bool = False) -> None:\n    \"\"\"Move the ZeRO optimizer buffers to the specified device.\n\n    Arguments:\n        include: Optional. The set of states to offload. If not provided, all states are offloaded.\n        device: Optional. The device to move the ZeRO optimizer buffers to.\n        pin_memory: Optional. Whether to pin the memory of the offloaded states.\n        non_blocking: Optional. Whether to offload the states asynchronously.\n...\ndef offload_states_back(self, non_blocking: bool = False) -> None:\n```\n\nHere is the typical usage.\n```python\n# Offload after forward, backward, and step\nmodel.offload_states()\n# Do something requiring a lot of device memory\n...\n# Load states back to device memory\nmodel.offload_states_back()\n```\n\nYou can selectively offload states to balance the offloading overhead\nand memory saving.\n```python\nmodel.offload_states(include=set([OffloadStateTypeEnum.hp_params, OffloadStateTypeEnum.opt_states], device=OffloadDeviceEnum.cpu)\n```\n\nPerformance (4.3B parameters / 4x A100)\n- Environment (4x A100, [benchmark\nscript](https://gist.github.com/tohtana/05d5faba5068cf839abfc7b1e38b85e4))\n- Average Device to Host transfer time: 2.45 GB/s, aggregated: 9.79 GB/s\n  - Average Host to Device transfer: 11.05 GB/s, aggregated: 44.19 GB/s\n- Mem (allocated by PyTorch)\n  - Before offload 18.2GB\n  - After offloading 17.7MB\n- Time ([benchmark\nscript](https://github.com/microsoft/DeepSpeedExamples/tree/tohtana/offload_states/training/offload_states),\noffloading time/loading time)\n\npython output_table.py \n| |pin_memory=0 non_blocking=0|pin_memory=0 non_blocking=1|pin_memory=1\nnon_blocking=0|pin_memory=1 non_blocking=1|\n\n|--:|---------------------------|---------------------------|---------------------------|---------------------------|\n| 1|4.34 / 3.42 |4.99 / 2.37 |6.5 / 2.42 |6.0 / 2.39 |\n| 2|9.9 / 3.28 |5.1 / 2.34 |6.21 / 2.42 |6.25 / 2.45 |\n| 3|9.92 / 3.19 |6.71 / 2.35 |6.33 / 2.38 |5.93 / 2.42 |\n| 4|9.55 / 2.82 |7.11 / 2.39 |6.9 / 2.38 |6.5 / 2.43 |\n| 5|4.4 / 3.35 |6.04 / 2.41 |6.26 / 2.41 |6.32 / 2.47 |\n| 6|4.4 / 3.57 |6.58 / 2.42 |6.88 / 2.4 |6.35 / 2.43 |\n| 7|9.51 / 3.12 |6.9 / 2.39 |6.9 / 2.39 |6.46 / 2.4 |\n| 8|4.77 / 3.64 |6.69 / 2.39 |7.39 / 2.42 |6.56 / 2.46 |\n| 9|9.5 / 3.07 |7.18 / 2.42 |6.67 / 2.39 |7.38 / 2.46 |\n\nTODO:\n- Enable offloading to a NVMe storage -> NVMe support is non-trivial. I\nsuggest adding the support in another PR\n- [DONE] Discard buffer (and recreate it) instead of offloading. We\ndon't need to restore the contiguous buffer for reduce.\n- [DONE] Check pin_memory improves performance or not\n\n---------\n\nCo-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>\nCo-authored-by: Olatunji Ruwase <olruwase@microsoft.com>","shortMessageHtmlLink":"Add APIs to offload states of model, optimizer, and engine (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2470766365\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6011\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6011/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6011\">#6011</a>)"}},{"before":"958dfc16b7c6c5d2fcf68f13316dcb92b20c911c","after":"6e05d2c6f729a1d0b8e1a8d8aecd21a7e89bed17","ref":"refs/heads/tohtana/consistent_zero_grad","pushedAt":"2024-09-27T05:30:10.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"Merge branch 'master' into tohtana/consistent_zero_grad","shortMessageHtmlLink":"Merge branch 'master' into tohtana/consistent_zero_grad"}},{"before":"7bc3c66875deb252861ed13d476d35670c933764","after":"3c5089f432423f09e5ce378f17f75ba0463b3420","ref":"refs/heads/tohtana/clean_up_prefetch_param","pushedAt":"2024-09-27T05:29:27.000Z","pushType":"push","commitsCount":9,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"Merge branch 'master' into tohtana/clean_up_prefetch_param","shortMessageHtmlLink":"Merge branch 'master' into tohtana/clean_up_prefetch_param"}},{"before":"d45cfd34551537ce6f8317504bd520d7a2a1a588","after":null,"ref":"refs/heads/gh-readonly-queue/master/pr-6532-ba58682a138760ee44b1366165fdbe4d87522323","pushedAt":"2024-09-27T00:22:58.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"}},{"before":"ba58682a138760ee44b1366165fdbe4d87522323","after":"d45cfd34551537ce6f8317504bd520d7a2a1a588","ref":"refs/heads/master","pushedAt":"2024-09-27T00:22:57.000Z","pushType":"merge_queue_merge","commitsCount":1,"pusher":{"login":"github-merge-queue[bot]","name":null,"path":"/apps/github-merge-queue","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9919?s=80&v=4"},"commit":{"message":"[XPU] Support DeepNVMe new code structure (#6532)\n\nIn DeepNVMe GDS update, many functions are changed into a more abstract\nway. Also added some files. These change break zero-infinity on XPU. To\nbring this feature back, we have this PR:\n1. modify the aio opbuilder for new files.\n2. Add custom cpu_op_desc_t for xpu users. (XPU don't handle buffer\naligned here)\n\n---------\n\nCo-authored-by: Olatunji Ruwase <olruwase@microsoft.com>\nCo-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>","shortMessageHtmlLink":"[XPU] Support DeepNVMe new code structure (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2524326605\" data-permission-text=\"Title is private\" data-url=\"https://github.com/microsoft/DeepSpeed/issues/6532\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/microsoft/DeepSpeed/pull/6532/hovercard\" href=\"https://github.com/microsoft/DeepSpeed/pull/6532\">#6532</a>)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yN1QyMDo0NToxNy4wMDAwMDBazwAAAATDAilk","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yN1QwMDoyMjo1Ny4wMDAwMDBazwAAAATCEWZb"}},"title":"Activity · microsoft/DeepSpeed"}