-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multiprocess to speed up training and playing. #82
base: master
Are you sure you want to change the base?
Conversation
Hi @JernejHabjan , Thank you for found this bug. |
Hi @gigayaya Wow, thanks for quick reply! Thanks for this patch, good job |
Is there any chance I can use this during training? |
Yes, I have a multiprocess version to speed up training. |
Cool! @gigayaya That would be amazing since I think self-play is the bottleneck of this training loop. |
@51616 Game: 6x6 Othello Cost time(self-play 128 games): I did not record the time with 1 process(original version), because it will take too much time. |
I would love to see the implementation, of course. @gigayaya |
I commit the multiprocess training version. |
Thanks a lot @gigayaya . I will be back on my RL project after my final exam in 2 weeks. |
@gigayaya As I understand, your code is parallel in game play but not during MCTS right? |
@51616 |
@gigayaya Thanks a lot! I really appreciate your help and explanation. :) |
@gigayaya Can i ask you why it has to create new model for each simulation during self-play? |
Use ResNet as default NN because cost less VRAM then CNN.
That's my bad. Ideally, it should create many self-play process and one model process. |
@gigayaya I can implement many self-play process with one model process using pytorch multiprocess instead of default library. But sometimes when i resume(load) from lastest model it doesn't use gpu at all. |
Wow, I have similar needs recently, thank you for your code. |
@im0qianqian Normally python subprocess needs its own memory for each process thus the model will be copy even if you initialize only a single model in the main process. This is an very inefficient implementation of multiprocessing and very tricky and required low-level implementation for parallelization. |
@51616 Thank you for your reply, I think you misunderstood what I meant. I mean we can create a small number of processes (like the number of CPU cores), perform an initialization process in each process, and then simulate multiple self-plays in each process. |
@im0qianqian I see. If you don't wanna create a model each time a new game is started you can modify the code pretty easily since the there's nothing complex there. But still, this implementation as I said, is not efficient yet but at least it utilizes multiprocessing of CPUs. |
@51616 Thanks a lot! I really appreciate your explanation. |
Each process play many games replace create many process to play 1 game.
Hi @51616 and @im0qianqian , thanks for your reply. Now, each process play many games instead 1 game. So do not need initial a NN to VRAM every time when create a new process.
Also, use numPerProcessAgainst to decide how many games will one process play during against-play phase.
If we want a very efficient way to speed up AlphaZero approach, we should parallel MCTS. |
@gigayaya Thank you for your code. |
@im0qianqian I think this is alpha-zero-general's problem. Hope this can help you. :) |
I tried this, 16 processes each play 4 games, the first process play like normal, the second process will return after 1 getNextState in playGame for the first 4 games and throw away then play 4 normal games. the third process will throw away first 8 games... it will produce pwin nwin other than multiple of 16 |
I do some change for pit.py.
In current version, it will cost a lot of time when playing many games(such as 200 games).
So I use multiprocess to improve the throughput of pit.py.