So I dont have access to a GPU ,but have access to a 25 cluster of Xeon CPUs . Here are my questions :
If I set the number of nodes in
qpub to 24 , will PyTorch automatically use all the nodes , because someone earlier said that it relies on Intel MKL and MKL should be able to detect all the CPUs available.
Also , does setting the
num_workers take care of automatically distributing the workload ?
pytorch will use all the cores on a single machine.
If you want to use 25 Xeon machines, then you will have to write some special logic using our
torch.distributed functions: http://pytorch.org/docs/master/distributed.html
Given that the
torch.distributed module is still in beta, does
torch.multiprocessing have any disadvantge performance wise other than running each script separately ?
Hi, thank you for asking the question because I have a similar issue. Were you able to distribute your training over many machines? could you tell me how you did it, please? I don’t have knowledge about parallel or distributed computing