PyTorch distributed data-parallel (multi GPU, multi-node)

I have access to 18 nodes each with different numbers of GPUs all with at least 1 to my understanding you have to declare all the nodes to have the same number of GPUs.

first of all, am I correct?

and second, If I am is there any way around this?

1 Like