Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MoE/ZeRO] fix .github conflict with main branch. #5827

Closed
wants to merge 70 commits into from

Commits on May 30, 2024

  1. [Fix/Example] Fix Llama Inference Loading Data Type (hpcaitech#5763)

    * [fix/example] fix llama inference loading dtype
    
    * revise loading dtype of benchmark llama3
    yuanheng-zhao authored May 30, 2024
    Configuration menu
    Copy the full SHA
    677cbfa View commit details
    Browse the repository at this point in the history

Commits on May 31, 2024

  1. [release] update version (hpcaitech#5752)

    * [release] update version
    
    * [devops] update compatibility test
    
    * [devops] update compatibility test
    
    * [devops] update compatibility test
    
    * [devops] update compatibility test
    
    * [test] fix ddp plugin test
    
    * [test] fix gptj and rpc test
    
    * [devops] fix cuda ext compatibility
    
    * [inference] fix flash decoding test
    
    * [inference] fix flash decoding test
    ver217 authored May 31, 2024
    Configuration menu
    Copy the full SHA
    68359ed View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2024

  1. fix (hpcaitech#5765)

    flybird11111 authored Jun 3, 2024
    Configuration menu
    Copy the full SHA
    3f2be80 View commit details
    Browse the repository at this point in the history
  2. [test] Fix/fix testcase (hpcaitech#5770)

    * [fix] branch for fix testcase;
    
    * [fix] fix test_analyzer & test_auto_parallel;
    
    * [fix] remove local change about moe;
    
    * [fix] rm local change moe;
    duanjunwen authored Jun 3, 2024
    Configuration menu
    Copy the full SHA
    1b76564 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4064432 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2024

  1. [CI/tests] simplify some test case to reduce testing time (hpcaitech#…

    …5755)
    
    * [ci/tests] simplify some test case to reduce testing time
    
    * [ci/tests] continue to remove test case to reduce ci time cost
    
    * restore some test config
    
    * [ci/tests] continue to reduce ci time cost
    Hz188 authored Jun 4, 2024
    Configuration menu
    Copy the full SHA
    e22b827 View commit details
    Browse the repository at this point in the history
  2. [misc] update dockerfile (hpcaitech#5776)

    * [misc] update dockerfile
    
    * [misc] update dockerfile
    ver217 authored Jun 4, 2024
    Configuration menu
    Copy the full SHA
    32f4187 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ee6fd38 View commit details
    Browse the repository at this point in the history

Commits on Jun 5, 2024

  1. [Inference]Add Streaming LLM (hpcaitech#5745)

    * Add Streaming LLM
    
    * add some parameters to llama_generation.py
    
    * verify streamingllm config
    
    * add test_streamingllm.py
    
    * modified according to the opinions of review
    
    * add Citation
    
    * change _block_tables tolist
    isky-cd authored Jun 5, 2024
    Configuration menu
    Copy the full SHA
    b45000f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    50b4c8e View commit details
    Browse the repository at this point in the history
  3. [misc] Accelerate CI for zero and dist optim (hpcaitech#5758)

    * remove fp16 from lamb
    
    * remove d2h copy in checking states
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Edenzzzz and Edenzzzz authored Jun 5, 2024
    Configuration menu
    Copy the full SHA
    79f7a7b View commit details
    Browse the repository at this point in the history
  4. [Test/CI] remove test cases to reduce CI duration (hpcaitech#5753)

    * [test] smaller gpt2 test case
    
    * [test] reduce test cases: tests/test_zero/test_gemini/test_zeroddp_state_dict.py
    
    * [test] reduce test cases: tests/test_zero/test_gemini/test_grad_accum.py
    
    * [test] reduce test cases tests/test_zero/test_gemini/test_optim.py
    
    * Revert "[test] smaller gpt2 test case"
    
    Some tests might depend on the size of model (num of chunks)
    
    This reverts commit df705a5.
    
    * [test] reduce test cases: tests/test_checkpoint_io/test_gemini_checkpoint_io.py
    
    * [CI] smaller test model for two mwo the two modifid cases
    
    * [CI] hardcode gpt model for tests/test_zero/test_gemini/test_search.py since we need a fixed answer there
    botbw authored Jun 5, 2024
    Configuration menu
    Copy the full SHA
    80c3c87 View commit details
    Browse the repository at this point in the history
  5. [hotfix] fix testcase in test_fx/test_tracer (hpcaitech#5779)

    * [fix] branch for fix testcase;
    
    * [fix] fix test_analyzer & test_auto_parallel;
    
    * [fix] remove local change about moe;
    
    * [fix] rm local change moe;
    
    * [fix] fix test_deepfm_model & test_dlrf_model;
    
    * [fix] fix test_hf_albert & test_hf_gpt;
    duanjunwen authored Jun 5, 2024
    Configuration menu
    Copy the full SHA
    10a19e2 View commit details
    Browse the repository at this point in the history
  6. [gemini] optimize reduce scatter d2h copy (hpcaitech#5760)

    * [gemini] optimize reduce scatter d2h copy
    
    * [fix] fix missing reduce variable
    
    * [refactor] remove legacy async reduce scatter code
    
    * [gemini] missing sync
    
    * Revert "[refactor] remove legacy async reduce scatter code"
    
    This reverts commit 58ad76d.
    
    * [gemini] further optimize with async all reduce
    
    * [fix] pass flag from manager to chunk
    botbw authored Jun 5, 2024
    Configuration menu
    Copy the full SHA
    3f7e313 View commit details
    Browse the repository at this point in the history
  7. Allow building cuda extension without a device. (hpcaitech#5535)

    Added FORCE_CUDA environment variable support, to enable building extensions where a GPU device is not present but cuda libraries are.
    ccoulombe authored Jun 5, 2024
    Configuration menu
    Copy the full SHA
    c46e097 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b9d646f View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2024

  1. [install]fix setup (hpcaitech#5786)

    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    flybird11111 and pre-commit-ci[bot] authored Jun 6, 2024
    Configuration menu
    Copy the full SHA
    a1e39f4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5ead00f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    73e88a5 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2024

  1. Configuration menu
    Copy the full SHA
    7a7e869 View commit details
    Browse the repository at this point in the history
  2. upgrade ppo dpo rm script

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    929e1e3 View commit details
    Browse the repository at this point in the history
  3. run pre-commit

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7e65b71 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0b4a335 View commit details
    Browse the repository at this point in the history
  5. fix training script

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7ae87b3 View commit details
    Browse the repository at this point in the history
  6. fix ci

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b1031f7 View commit details
    Browse the repository at this point in the history
  7. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    pre-commit-ci[bot] authored and YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    1b880ce View commit details
    Browse the repository at this point in the history
  8. fix transformers version

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b8b5cac View commit details
    Browse the repository at this point in the history
  9. remove duplicated test

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    62eb28b View commit details
    Browse the repository at this point in the history
  10. fix datasets version

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    0bbac15 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    bf57b13 View commit details
    Browse the repository at this point in the history
  12. remove local data path

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    45195ac View commit details
    Browse the repository at this point in the history
  13. update ci

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e16ccc2 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    ac1520c View commit details
    Browse the repository at this point in the history
  15. merge

    YeAnbang committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    790e136 View commit details
    Browse the repository at this point in the history
  16. Refactor modeling by adding attention backend

    Signed-off-by: char-1ee <[email protected]>
    char-1ee committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    04386d9 View commit details
    Browse the repository at this point in the history
  17. Fix tests and naming

    Signed-off-by: char-1ee <[email protected]>
    char-1ee committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    eec77e5 View commit details
    Browse the repository at this point in the history
  18. Pass inference model shard configs for module init

    Signed-off-by: char-1ee <[email protected]>
    char-1ee committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5f398fc View commit details
    Browse the repository at this point in the history
  19. Clean up

    Signed-off-by: char-1ee <[email protected]>
    char-1ee committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ceba662 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    0d7ff10 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    77db216 View commit details
    Browse the repository at this point in the history
  22. Remove flash attention backend

    Signed-off-by: char-1ee <[email protected]>
    char-1ee committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    f5981e8 View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2024

  1. fix readme

    YeAnbang committed Jun 10, 2024
    Configuration menu
    Copy the full SHA
    2abdede View commit details
    Browse the repository at this point in the history
  2. Fix test import

    Signed-off-by: char-1ee <[email protected]>
    char-1ee committed Jun 10, 2024
    Configuration menu
    Copy the full SHA
    b303976 View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#5771 from char-1ee/refactor/modeling

    [Inference] Refactor modeling attention layer by abstracting attention backends
    char-1ee authored Jun 10, 2024
    Configuration menu
    Copy the full SHA
    77a219a View commit details
    Browse the repository at this point in the history

Commits on Jun 11, 2024

  1. update sft trainning script

    YeAnbang committed Jun 11, 2024
    Configuration menu
    Copy the full SHA
    84eab13 View commit details
    Browse the repository at this point in the history
  2. [Inference]refactor baichuan (hpcaitech#5791)

    * refactor baichuan
    
    * remove unused code and add TODO for lazyinit
    LRY89757 authored Jun 11, 2024
    Configuration menu
    Copy the full SHA
    c0948af View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#5759 from hpcaitech/colossalchat_upgrade

    [ColossalChat] Colossalchat upgrade
    YeAnbang authored Jun 11, 2024
    Configuration menu
    Copy the full SHA
    74f4a29 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    587bbf4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    aa125bc View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2024

  1. Configuration menu
    Copy the full SHA
    aac941e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b6ea9e7 View commit details
    Browse the repository at this point in the history
  3. sync with upstream

    Hz188 committed Jun 12, 2024
    Configuration menu
    Copy the full SHA
    79d63ec View commit details
    Browse the repository at this point in the history
  4. [Inference] Fix flash-attn import and add model test (hpcaitech#5794)

    * Fix torch int32 dtype
    
    Signed-off-by: char-1ee <[email protected]>
    
    * Fix flash-attn import
    
    Signed-off-by: char-1ee <[email protected]>
    
    * Add generalized model test
    
    Signed-off-by: char-1ee <[email protected]>
    
    * Remove exposed path to model
    
    Signed-off-by: char-1ee <[email protected]>
    
    * Add default value for use_flash_attn
    
    Signed-off-by: char-1ee <[email protected]>
    
    * Rename model test
    
    Signed-off-by: char-1ee <[email protected]>
    
    ---------
    
    Signed-off-by: char-1ee <[email protected]>
    char-1ee authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    8554585 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ec99700 View commit details
    Browse the repository at this point in the history
  6. [Gemini] Use async stream to prefetch and h2d data moving (hpcaitech#…

    …5781)
    
    * use async stream to prefetch and h2d data moving
    
    * Remove redundant code
    Hz188 authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    d9dddf5 View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2024

  1. [gemini] quick fix on possible async operation (hpcaitech#5803)

    * [gemini] quick fix on possible async operation
    
    * [gemini] quick fix on possible async operation
    botbw authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    3bcbba9 View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2024

  1. [shardformer] upgrade transformers to 4.39.3 (hpcaitech#5815)

    * [shardformer]upgrade transformers for gpt2/gptj/whisper (hpcaitech#5807)
    
    * [shardformer] fix modeling of gpt2 and gptj
    
    * [shardformer] fix whisper modeling
    
    * [misc] update requirements
    
    ---------
    
    Co-authored-by: ver217 <[email protected]>
    
    * [shardformer]upgrade transformers for mistral (hpcaitech#5808)
    
    * upgrade transformers for mistral
    
    * fix
    
    * fix
    
    * [shardformer]upgrade transformers for llama (hpcaitech#5809)
    
    * update transformers
    
    fix
    
    * fix
    
    * fix
    
    * [inference] upgrade transformers (hpcaitech#5810)
    
    * update transformers
    
    fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * [gemini] update transformers for gemini (hpcaitech#5814)
    
    ---------
    
    Co-authored-by: ver217 <[email protected]>
    flybird11111 and ver217 authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    2ddf624 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    be92747 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    76aeec3 View commit details
    Browse the repository at this point in the history
  4. update moe hybrid parallel plugin with newest version of zero & fix z…

    …ero working/master params bug
    Hz188 committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    64fc0f7 View commit details
    Browse the repository at this point in the history
  5. fix zero unit test

    Hz188 committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    8b277cc View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ed42193 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    88b78fa View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2024

  1. Configuration menu
    Copy the full SHA
    419d25e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3364ac9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f7298bc View commit details
    Browse the repository at this point in the history
  4. fix typo

    Hz188 committed Jun 17, 2024
    Configuration menu
    Copy the full SHA
    e6839fb View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    cc9d0bb View commit details
    Browse the repository at this point in the history
  6. Support 4d parallel + flash attention (hpcaitech#5789)

    * support tp + sp + pp
    
    * remove comments
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Edenzzzz and Edenzzzz authored Jun 17, 2024
    Configuration menu
    Copy the full SHA
    8795bb2 View commit details
    Browse the repository at this point in the history

Commits on Jun 18, 2024

  1. Configuration menu
    Copy the full SHA
    1405cf1 View commit details
    Browse the repository at this point in the history