I have access to a cluster with multiple GPU nodes each having 1 GPU device on-board. Can I use multiple nodes to run my code rather than a single GPU? If yes, how can I do that? Usually we set devices through CUDA_VISIBLE_DEVICES
but I am not sure how to run code in a cluster with multiple GPU nodes.
Not yet. We are going to have distributed support in the next major release of pytorch.
Can you give any indications on what technology you will use for the multi-node implementation. Will it follow an MPI approach such as https://github.com/pytorch/pytorch/issues/241 (that would be useful for me - but perhaps not for others). Also the timescale for that release would be interesting to know.