Is it possible to train a model across multiple remote servers in my department? These servers are not connected to each other. I want to use GPUs of both the servers (with different IP addresses) so that I can train with larger batch size.
I have seen
nn.DistributedDataParallel but how do I mention the IP address of multiple servers?