Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: 环境安装失败 #6066

Open
eccct opened this issue Sep 21, 2024 · 16 comments
Open

[DOC]: 环境安装失败 #6066

eccct opened this issue Sep 21, 2024 · 16 comments
Labels
documentation Improvements or additions to documentation

Comments

@eccct
Copy link

eccct commented Sep 21, 2024

📚 The doc issue

Win11安装 Ubuntu24.04子系统 WSL2
按照网站指导https://colossalai.org/zh-Hans/docs/get_started/installation
具体按照步骤如下:
export CUDA_INSTALL_DIR=/usr/local/cuda-12.1
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME"/lib64:$LD_LIBRARY_PATH"
export PATH=$CUDA_HOME"/bin:$PATH"

conda create -n colo01 python=3.10
conda activate colo01
export PATH=~/miniconda3/envs/colo01/bin:$PATH

sudo apt update
sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 60
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 60
sudo update-alternatives --config gcc
gcc --version

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sudo sh cuda_12.1.0_530.30.02_linux.run
验证 CUDA 安装:nvidia-smi

conda install nvidia/label/cuda-12.1.0::cuda-toolkit
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

pip install -r requirements/requirements.txt
CUDA_EXT=1 pip install .

安装相关的开发库
pip install transformers
pip install xformers
pip install datasets tensorboard

运行benchmark
Step1: 切换目录
cd examples/language/llama/scripts/benchmark_7B
修改gemini.sh
bash gemini.sh

执行后提示错误
[rank0]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

然后安装flashattention-2成功
pip install packaging
pip install ninja
ninja --version
echo $?
conda install -c conda-channel attention2
pip install flash-attn --no-build-isolation

再次执行bash gemini.sh,还是有错误。麻烦根据上传的log文件给予解答,最好能够完善安装文档,谢谢!
gcc_nvidia-smi_pytorch_python
log.txt

@eccct eccct added the documentation Improvements or additions to documentation label Sep 21, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [DOC]: Environment installation failed

@eccct
Copy link
Author

eccct commented Sep 21, 2024

再次运行benchmark,bash gemini.sh后系统长时间停顿在编译阶段。
bash gemini sh
使用colossalai check -i这个命令来检查目前环境里的版本兼容性以及CUDA Extension的状态。
colossalai check again
请帮忙分析一下原因,谢谢!

@Edenzzzz
Copy link
Contributor

You should use BUILD_EXT=1 pip install . and see if that compiles.

@eccct
Copy link
Author

eccct commented Sep 23, 2024

I tried to use BUILD_EXT=1 pip install . and it failed to build, please check the log files I uploaded.
Thanks!
pip install -r requirementsrequirements.txt
BUILD_EXT=1 pip install.txt

@wangbluo
Copy link
Contributor

You should troubleshoot your issue from the following aspects (the provided log information is limited). First, check that there are no issues with your machine, for example, by running nvidia-smi to confirm the availability of the GPUs. Check environment variables such as CUDA_VISIBLE_DEVICES, and ensure that LD_LIBRARY_PATH and CUDA_HOME are pointing to the correct CUDA version.

@wangbluo
Copy link
Contributor

wangbluo commented Sep 23, 2024

Oh, I got it, seems like it's keeping compiling the JIT kernel op, it really takes some time and you didn't finish the compiling.

@eccct
Copy link
Author

eccct commented Sep 23, 2024

Yesterday I ran " instead of "CUDA_EXT=1 pip install .", it build successfully. Then I ran benchmark with "bash gemini.sh", it took long time without responding.
I updated the ticket and Edenzzzz replied me to use BUILD_EXT=1 pip install . and see if that compiles.
Then I ran "pip install -r requirementsrequirements" and returned successfully.
I used "BUILD_EXT=1 pip install . " instead of "CUDA_EXT=1 pip install .", it failed to build.
Please check two uploaded files. Thanks!

@eccct
Copy link
Author

eccct commented Sep 23, 2024

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $CUDA_HOME
/usr/local/cuda-12.1

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $PATH
/root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure Dev CLI:/mnt/c/Users/eccct/.cache/lm-studio/bin:/snap/bin

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $LD_LIBRARY_PATH
/usr/local/cuda-12.1/lib64:

@eccct
Copy link
Author

eccct commented Sep 23, 2024

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten)
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration.
warnings.warn(
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel
warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel")
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/legacy/registry/init.py:1: DeprecationWarning: TorchScript support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the torch.compile optimizer instead.
import torch.distributed.optim as dist_optim
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/accelerator/cuda_accelerator.py:282: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
return torch.cuda.amp.autocast(enabled=enabled, dtype=dtype, cache_enabled=cache_enabled)
[09/23/24 11:49:40] INFO colossalai - colossalai - INFO:
/root/miniconda3/envs/colo01/lib/python3.10/site-pa
ckages/colossalai-0.4.4-py3.10.egg/colossalai/initi
alize.py:75 launch
INFO colossalai - colossalai - INFO: Distributed
environment is initialized, world size: 1
WARNING colossalai - colossalai - WARNING:
/root/miniconda3/envs/colo01/lib/python3.10/site-pa
ckages/colossalai-0.4.4-py3.10.egg/colossalai/boost
er/plugin/gemini_plugin.py:382 init
WARNING colossalai - colossalai - WARNING:
enable_async_reduce sets pin_memory=True to achieve
best performance, which is not implicitly set.
Model params: 1.19 B
[extension] Compiling the JIT cpu_adam_x86 kernel during runtime now

@eccct
Copy link
Author

eccct commented Sep 23, 2024

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten)
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration.
warnings.warn(
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel
warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel")
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/legacy/registry/init.py:1: DeprecationWarning: TorchScript support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the torch.compile optimizer instead.
import torch.distributed.optim as dist_optim
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/accelerator/cuda_accelerator.py:282: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
return torch.cuda.amp.autocast(enabled=enabled, dtype=dtype, cache_enabled=cache_enabled)
[09/23/24 11:49:40] INFO colossalai - colossalai - INFO:
/root/miniconda3/envs/colo01/lib/python3.10/site-pa
ckages/colossalai-0.4.4-py3.10.egg/colossalai/initi
alize.py:75 launch
INFO colossalai - colossalai - INFO: Distributed
environment is initialized, world size: 1
WARNING colossalai - colossalai - WARNING:
/root/miniconda3/envs/colo01/lib/python3.10/site-pa
ckages/colossalai-0.4.4-py3.10.egg/colossalai/boost
er/plugin/gemini_plugin.py:382 init
WARNING colossalai - colossalai - WARNING:
enable_async_reduce sets pin_memory=True to achieve
best performance, which is not implicitly set.
Model params: 1.19 B
[extension] Compiling the JIT cpu_adam_x86 kernel during runtime now
[extension] Time taken to compile cpu_adam_x86 op: 26.49249792098999 seconds
[extension] Compiling the JIT fused_optim_cuda kernel during runtime now
[extension] Time taken to compile fused_optim_cuda op: 57.339394330978394 seconds
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 82, in register_tensor
[rank0]: chunk_group[-1].append_tensor(tensor)
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 277, in append_tensor
[rank0]: raise ChunkFullError
[rank0]: colossalai.zero.gemini.chunk.chunk.ChunkFullError

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 364, in
[rank0]: main()
[rank0]: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 308, in main
[rank0]: model, optimizer, _, dataloader, _ = booster.boost(model, optimizer, dataloader=dataloader)
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/booster.py", line 154, in boost
[rank0]: model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/plugin/gemini_plugin.py", line 571, in configure
[rank0]: model = GeminiDDP(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 183, in init
[rank0]: self._init_chunks(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 882, in _init_chunks
[rank0]: self.chunk_manager.register_tensor(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 90, in register_tensor
[rank0]: self.__close_one_chunk(chunk_group[-1])
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 250, in __close_one_chunk
[rank0]: chunk.close_chunk()
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 314, in close_chunk
[rank0]: self.cpu_shard = torch.empty(self.shard_size, dtype=self.dtype, pin_memory=self.pin_memory)
[rank0]: RuntimeError: CUDA error: out of memory
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception ignored in: <function GeminiDDP.del at 0x7f2fc9e3f640>
Traceback (most recent call last):
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 222, in del
self.remove_hooks()
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 213, in remove_hooks
for p in self.module.parameters():
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'GeminiDDP' object has no attribute 'module'
[rank0]:[W923 11:51:06.112981702 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E0923 11:51:08.022000 140421874321216 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 2515) of binary: /root/miniconda3/envs/colo01/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/colo01/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.4.0', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

benchmark.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-09-23_11:51:07
host : DESKTOP-5H0EB03.
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2515)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 256 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code!

Command: 'cd /root/ColossalAI/examples/language/llama && export SHELL="/bin/bash" GCC_RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ranlib" WSL2_GUI_APPS_ENABLED="1" CONDA_EXE="/root/miniconda3/bin/conda" WSL_DISTRO_NAME="Ubuntu-24.04" build_alias="x86_64-conda-linux-gnu" CMAKE_ARGS="-DCMAKE_LINKER=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" GPROF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gprof" _CONDA_PYTHON_SYSCONFIGDATA_NAME="_sysconfigdata_x86_64_conda_cos7_linux_gnu" STRINGS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strings" CPP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cpp" NAME="DESKTOP-5H0EB03" PWD="/root/ColossalAI/examples/language/llama" GSETTINGS_SCHEMA_DIR="/root/miniconda3/envs/colo01/share/glib-2.0/schemas" LOGNAME="root" CONDA_PREFIX="/root/miniconda3/envs/colo01" CXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" DEBUG_CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" MOTD_SHOWN="update-motd" LDFLAGS="-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/root/miniconda3/envs/colo01/lib -Wl,-rpath-link,/root/miniconda3/envs/colo01/lib -L/root/miniconda3/envs/colo01/lib" HOME="/root" LANG="C.UTF-8" WSL_INTEROP="/run/WSL/318_interop" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.avif=01;35:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:~=00;90:#=00;90:.bak=00;90:.crdownload=00;90:.dpkg-dist=00;90:.dpkg-new=00;90:.dpkg-old=00;90:.dpkg-tmp=00;90:.old=00;90:.orig=00;90:.part=00;90:.rej=00;90:.rpmnew=00;90:.rpmorig=00;90:.rpmsave=00;90:.swp=00;90:.tmp=00;90:.ucf-dist=00;90:.ucf-new=00;90:*.ucf-old=00;90:" DEBUG_CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" WAYLAND_DISPLAY="wayland-0" CXX_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" ELFEDIT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-elfedit" CONDA_PROMPT_MODIFIER="(colo01) " CMAKE_PREFIX_PATH="/root/miniconda3/envs/colo01:/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot/usr" CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /root/miniconda3/envs/colo01/include" LD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld" READELF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-readelf" GXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-g++" GCC_AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ar" LESSCLOSE="/usr/bin/lesspipe %s %s" ADDR2LINE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-addr2line" TERM="xterm-256color" SIZE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-size" GCC_NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-nm" HOST="x86_64-conda-linux-gnu" LESSOPEN="| /usr/bin/lesspipe %s" CC_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" USER="root" CONDA_SHLVL="2" AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ar" AS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-as" DEBUG_CPPFLAGS="-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /root/miniconda3/envs/colo01/include" host_alias="x86_64-conda-linux-gnu" DISPLAY=":0" SHLVL="2" NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-nm" GCC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc" CUDA_INSTALL_DIR="/usr/local/cuda-12.1" LD_GOLD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld.gold" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64:" XDG_RUNTIME_DIR="/run/user/0/" CONDA_DEFAULT_ENV="colo01" OBJCOPY="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objcopy" OMP_NUM_THREADS="1" STRIP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" CUDA_HOME="/usr/local/cuda-12.1" XDG_DATA_DIRS="/usr/local/share:/usr/share:/var/lib/snapd/desktop" OBJDUMP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objdump" PATH="/root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure Dev CLI:/mnt/c/Users/eccct/.cache/lm-studio/bin:/snap/bin" CC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" CXXFILT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++filt" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/0/bus" BUILD="x86_64-conda-linux-gnu" HOSTTYPE="x86_64" CONDA_PREFIX_1="/root/miniconda3" PULSE_SERVER="unix:/mnt/wslg/PulseServer" RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ranlib" CONDA_BUILD_SYSROOT="/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot" OLDPWD="/root/ColossalAI/examples/language/llama/scripts/benchmark_7B" _="/root/miniconda3/envs/colo01/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 256'

Exit code: 1

@flybird11111
Copy link
Contributor

Hi, can you try out GCC (Ubuntu 9.4.0-1ubuntu1~20.04.2) version 9.4.0?

@eccct
Copy link
Author

eccct commented Sep 23, 2024

@flybird11111
(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version
gcc (Ubuntu 12.3.0-17ubuntu1) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Do you mean to degrade gcc 12.3 to 9.4?

@flybird11111
Copy link
Contributor

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version gcc (Ubuntu 12.3.0-17ubuntu1) 12.3.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Do you mean to degrade gcc 12.3 to 9.4?

yes.

@eccct
Copy link
Author

eccct commented Sep 23, 2024

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# sudo update-alternatives --config gcc
There are 3 choices for the alternative gcc (providing /usr/bin/gcc).

Selection Path Priority Status

0 /usr/bin/gcc-12 60 auto mode
1 /usr/bin/gcc-10 60 manual mode

  • 2 /usr/bin/gcc-12 60 manual mode
    3 /usr/bin/gcc-9 60 manual mode

Press to keep the current choice[*], or type selection number: 3
update-alternatives: using /usr/bin/gcc-9 to provide /usr/bin/gcc (gcc) in manual mode

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version
gcc (Ubuntu 9.5.0-6ubuntu2) 9.5.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten)
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration.
warnings.warn(
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel
warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel")
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/legacy/registry/init.py:1: DeprecationWarning: TorchScript support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the torch.compile optimizer instead.
import torch.distributed.optim as dist_optim
/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/accelerator/cuda_accelerator.py:282: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
return torch.cuda.amp.autocast(enabled=enabled, dtype=dtype, cache_enabled=cache_enabled)
[09/23/24 17:52:56] INFO colossalai - colossalai - INFO:
/root/miniconda3/envs/colo01/lib/python3.10/site-pa
ckages/colossalai-0.4.4-py3.10.egg/colossalai/initi
alize.py:75 launch
INFO colossalai - colossalai - INFO: Distributed
environment is initialized, world size: 1
WARNING colossalai - colossalai - WARNING:
/root/miniconda3/envs/colo01/lib/python3.10/site-pa
ckages/colossalai-0.4.4-py3.10.egg/colossalai/boost
er/plugin/gemini_plugin.py:382 init
WARNING colossalai - colossalai - WARNING:
enable_async_reduce sets pin_memory=True to achieve
best performance, which is not implicitly set.
Model params: 615.01 M
[extension] Compiling the JIT cpu_adam_x86 kernel during runtime now
[extension] Time taken to compile cpu_adam_x86 op: 0.08598995208740234 seconds
[extension] Compiling the JIT fused_optim_cuda kernel during runtime now
[extension] Time taken to compile fused_optim_cuda op: 0.09327149391174316 seconds
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 82, in register_tensor
[rank0]: chunk_group[-1].append_tensor(tensor)
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 277, in append_tensor
[rank0]: raise ChunkFullError
[rank0]: colossalai.zero.gemini.chunk.chunk.ChunkFullError

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 364, in
[rank0]: main()
[rank0]: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 308, in main
[rank0]: model, optimizer, _, dataloader, _ = booster.boost(model, optimizer, dataloader=dataloader)
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/booster.py", line 154, in boost
[rank0]: model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/plugin/gemini_plugin.py", line 571, in configure
[rank0]: model = GeminiDDP(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 183, in init
[rank0]: self._init_chunks(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 882, in _init_chunks
[rank0]: self.chunk_manager.register_tensor(
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 90, in register_tensor
[rank0]: self.__close_one_chunk(chunk_group[-1])
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 250, in __close_one_chunk
[rank0]: chunk.close_chunk()
[rank0]: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 314, in close_chunk
[rank0]: self.cpu_shard = torch.empty(self.shard_size, dtype=self.dtype, pin_memory=self.pin_memory)
[rank0]: RuntimeError: CUDA error: out of memory
[rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception ignored in: <function GeminiDDP.del at 0x7ff98200b640>
Traceback (most recent call last):
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 222, in del
self.remove_hooks()
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 213, in remove_hooks
for p in self.module.parameters():
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'GeminiDDP' object has no attribute 'module'
[rank0]:[W923 17:52:58.822163966 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E0923 17:52:59.349000 140205836552000 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1792) of binary: /root/miniconda3/envs/colo01/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/colo01/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.4.0', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

benchmark.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-09-23_17:52:59
host : DESKTOP-5H0EB03.
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1792)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 64 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code!

Command: 'cd /root/ColossalAI/examples/language/llama && export SHELL="/bin/bash" GCC_RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ranlib" WSL2_GUI_APPS_ENABLED="1" CONDA_EXE="/root/miniconda3/bin/conda" WSL_DISTRO_NAME="Ubuntu-24.04" build_alias="x86_64-conda-linux-gnu" CMAKE_ARGS="-DCMAKE_LINKER=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" GPROF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gprof" _CONDA_PYTHON_SYSCONFIGDATA_NAME="_sysconfigdata_x86_64_conda_cos7_linux_gnu" STRINGS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strings" CPP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cpp" NAME="DESKTOP-5H0EB03" PWD="/root/ColossalAI/examples/language/llama" GSETTINGS_SCHEMA_DIR="/root/miniconda3/envs/colo01/share/glib-2.0/schemas" LOGNAME="root" CONDA_PREFIX="/root/miniconda3/envs/colo01" CXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" DEBUG_CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" LDFLAGS="-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/root/miniconda3/envs/colo01/lib -Wl,-rpath-link,/root/miniconda3/envs/colo01/lib -L/root/miniconda3/envs/colo01/lib" HOME="/root" LANG="C.UTF-8" WSL_INTEROP="/run/WSL/1425_interop" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.avif=01;35:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:~=00;90:#=00;90:.bak=00;90:.crdownload=00;90:.dpkg-dist=00;90:.dpkg-new=00;90:.dpkg-old=00;90:.dpkg-tmp=00;90:.old=00;90:.orig=00;90:.part=00;90:.rej=00;90:.rpmnew=00;90:.rpmorig=00;90:.rpmsave=00;90:.swp=00;90:.tmp=00;90:.ucf-dist=00;90:.ucf-new=00;90:*.ucf-old=00;90:" DEBUG_CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" WAYLAND_DISPLAY="wayland-0" CXX_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" CUDA_LAUNCH_BLOCKING="1" ELFEDIT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-elfedit" CONDA_PROMPT_MODIFIER="(colo01) " CMAKE_PREFIX_PATH="/root/miniconda3/envs/colo01:/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot/usr" CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /root/miniconda3/envs/colo01/include" LD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld" READELF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-readelf" GXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-g++" CUDA_VISIBLE_DEVICES="0" GCC_AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ar" LESSCLOSE="/usr/bin/lesspipe %s %s" ADDR2LINE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-addr2line" TERM="xterm-256color" SIZE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-size" GCC_NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-nm" HOST="x86_64-conda-linux-gnu" LESSOPEN="| /usr/bin/lesspipe %s" CC_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" USER="root" CONDA_SHLVL="2" AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ar" AS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-as" DEBUG_CPPFLAGS="-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /root/miniconda3/envs/colo01/include" host_alias="x86_64-conda-linux-gnu" DISPLAY=":0" SHLVL="2" NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-nm" GCC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc" CUDA_INSTALL_DIR="/usr/local/cuda-12.1" LD_GOLD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld.gold" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64:" XDG_RUNTIME_DIR="/run/user/0/" CONDA_DEFAULT_ENV="colo01" OBJCOPY="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objcopy" OMP_NUM_THREADS="1" STRIP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" CUDA_HOME="/usr/local/cuda-12.1" XDG_DATA_DIRS="/usr/local/share:/usr/share:/var/lib/snapd/desktop" OBJDUMP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objdump" PATH="/root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure Dev CLI:/mnt/c/Users/eccct/.cache/lm-studio/bin:/snap/bin" CC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" CXXFILT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++filt" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/0/bus" BUILD="x86_64-conda-linux-gnu" HOSTTYPE="x86_64" CONDA_PREFIX_1="/root/miniconda3" PULSE_SERVER="unix:/mnt/wslg/PulseServer" RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ranlib" CONDA_BUILD_SYSROOT="/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot" OLDPWD="/root/ColossalAI/examples/language/llama/scripts/benchmark_7B" _="/root/miniconda3/envs/colo01/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 64'

Exit code: 1

@flybird11111
Copy link
Contributor

[rank0]: RuntimeError: CUDA error: out of memory, It seems that the issue is due to insufficient memory.

@eccct
Copy link
Author

eccct commented Sep 23, 2024

@flybird11111 I changed benchmark.py degrade model to 1b, num_hidden_layers=1 and set cache to 256,
As it shows that "Model params: 615.01 M", in which condition cause out of memroy?

colossalai run --nproc_per_node 1 benchmark.py -g -x -b 16 -c 1b -l 256

MODEL_CONFIGS = {
"100m": LlamaConfig(
max_position_embeddings=4096,
num_hidden_layers=4,
num_attention_heads=32,
intermediate_size=2048,
hidden_size=1024,
),
"5b": LlamaConfig(max_position_embeddings=4096, num_key_value_heads=8),
"7b": LlamaConfig(max_position_embeddings=4096),
"1b": LlamaConfig(
hidden_size=5120,
intermediate_size=13824,
num_hidden_layers=1,
num_attention_heads=40,
max_position_embeddings=4096,
),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants