You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear developers
It is so great that you've provided a examples/imagenet/main.py script which looks amazing.
I'm looking how to setup a Multi-processing Distributed Data Parallel Training, for instance 8 GPUs on a single node but I can also use multi-nodes multi-gpus. I must say that I have never had so great infrastructure that I'm discovering at the same times.
Now, I was used to view the evolution of the Accuracies (Top 1, Top 5, train/val) during the training (rather common isn't it), but looking at the code (main.py) I do not see the
Now, in the multi-gpus processing I would imagine that one has to deal with "which gpu among the whole sets of gpus should/must do the job". But I am pretty sure that many experts are doing such things routinely.
Is there a foreseen new version of main.py that would integrate such TensorBoard features in case of Multi-processing Distributed Data Parallel Training? In the mean while may be someone can help to setup such modifications.
The text was updated successfully, but these errors were encountered:
Dear developers
It is so great that you've provided a examples/imagenet/main.py script which looks amazing.
I'm looking how to setup a Multi-processing Distributed Data Parallel Training, for instance 8 GPUs on a single node but I can also use multi-nodes multi-gpus. I must say that I have never had so great infrastructure that I'm discovering at the same times.
Now, I was used to view the evolution of the Accuracies (Top 1, Top 5, train/val) during the training (rather common isn't it), but looking at the code (main.py) I do not see the
and similar code used in the train/validate routines like
Now, in the multi-gpus processing I would imagine that one has to deal with "which gpu among the whole sets of gpus should/must do the job". But I am pretty sure that many experts are doing such things routinely.
Is there a foreseen new version of main.py that would integrate such TensorBoard features in case of Multi-processing Distributed Data Parallel Training? In the mean while may be someone can help to setup such modifications.
The text was updated successfully, but these errors were encountered: