Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low resource utilization with default gomoku_alphazero_sp_mode_config.py #21

Closed
CWHer opened this issue Apr 16, 2023 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@CWHer
Copy link

CWHer commented Apr 16, 2023

I followed the Installation and Quick Start parts in the README.

However, when I tried the default gomoku_alphazero_sp_mode_config.py, it ran with extremely low resource utilization.

I wonder if I have done something wrong. 🤔 Could you help explain this?

  • Real-time resource utilization

The machine has 24 logical CPU cores and 1x A100 GPU.

image

  • Launch command
python -u zoo/board_games/gomoku/config/gomoku_alphazero_sp_mode_config.py
  • Content of the config
from easydict import EasyDict

# ==============================================================
# begin of the most frequently changed config specified by the user
# ==============================================================
board_size = 6  # default_size is 15
collector_env_num = 32
n_episode = 32
evaluator_env_num = 5
num_simulations = 100
update_per_collect = 50
batch_size = 256
max_env_step = int(1e6)
prob_random_action_in_bot = 0.5
# ==============================================================
# end of the most frequently changed config specified by the user
# ==============================================================
gomoku_alphazero_config = dict(
    exp_name=
    f'data_az_ptree/gomoku_alphazero_sp-mode_rand{prob_random_action_in_bot}_ns{num_simulations}_upc{update_per_collect}_seed0',
    env=dict(
        board_size=board_size,
        battle_mode='self_play_mode',
        bot_action_type='v0',
        prob_random_action_in_bot=prob_random_action_in_bot,
        channel_last=False,  # NOTE
        collector_env_num=collector_env_num,
        evaluator_env_num=evaluator_env_num,
        n_evaluator_episode=evaluator_env_num,
        manager=dict(shared_memory=False, ),
    ),
    policy=dict(
        model=dict(
            observation_shape=(3, board_size, board_size),
            action_space_size=int(1 * board_size * board_size),
            # representation_network_type='conv_res_blocks',  # options={'conv_res_blocks', 'identity'}
            num_res_blocks=1,
            num_channels=32,
        ),
        cuda=True,
        board_size=board_size,
        lr_piecewise_constant_decay=False,
        update_per_collect=update_per_collect,
        batch_size=batch_size,
        optim_type='AdamW',
        learning_rate=0.003,
        weight_decay=0.0001,
        grad_norm=0.5,
        value_weight=1.0,
        entropy_weight=0.0,
        n_episode=n_episode,
        eval_freq=int(2e3),
        num_simulations=num_simulations,
        collector_env_num=collector_env_num,
        evaluator_env_num=evaluator_env_num,
    ),
)

gomoku_alphazero_config = EasyDict(gomoku_alphazero_config)
main_config = gomoku_alphazero_config

gomoku_alphazero_create_config = dict(
    env=dict(
        type='gomoku',
        import_names=['zoo.board_games.gomoku.envs.gomoku_env'],
    ),
    env_manager=dict(type='subprocess'),
    policy=dict(
        type='alphazero',
        import_names=['lzero.policy.alphazero'],
    ),
    collector=dict(
        type='episode_alphazero',
        get_train_sample=False,
        import_names=['lzero.worker.alphazero_collector'],
    ),
    evaluator=dict(
        type='alphazero',
        import_names=['lzero.worker.alphazero_evaluator'],
    )
)
gomoku_alphazero_create_config = EasyDict(gomoku_alphazero_create_config)
create_config = gomoku_alphazero_create_config

if __name__ == '__main__':
    from lzero.entry import train_alphazero
    train_alphazero([main_config, create_config], seed=0, max_env_step=max_env_step)
@PaParaZz1
Copy link
Member

This problem is because the current AlphaZeroPolicy uses ptree (Python tree search) and python env implementation, it costs 70~80% time and leads to low utilization metrics. We are polishing the new code implementations with cython and jax, which can improve efficiency a lot. This feature will be updated in 1-2 weeks.

@CWHer
Copy link
Author

CWHer commented Apr 21, 2023

Thanks for the reply. Really looking forward to seeing the new feature.

@puyuan1996 puyuan1996 added the enhancement New feature or request label May 6, 2023
@puyuan1996
Copy link
Collaborator

puyuan1996 commented Aug 28, 2023

Hello, we have provided some analysis about the issue of low GPU utilization in Issue 86. Furthermore, in the Pull Request 65, we have incorporated the AlphaZero ctree (cpp tree search) implementation. Following thorough testing and verification, this will be merged into the main branch. We appreciate your patience and attention. Best wishes.

@CWHer CWHer closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants