Is it possible to do multi-machine single card training

I am a deep learning beginner and running PyTorch’s Demo for study. I have several computers and laptops at home, the situation is different for each machine, some machines only have CPU, and some have powerful GPU, but I hope they all come together to speed up a training process.

My implementation idea is the C / S structure. When the server is doing model input, if there is a request from the client, any number of samples are sent to the client through the socket, and the calculation of these sent samples is skipped. After the client completed, it sends back the results of the model forward, and the server merges the results. Finally, if the server completed an epoch, it blocks and waits for the results of all clients to return.

Now, what I don’t know is how to merge the results of the model foward on the client into the server. I don’t know if this is correct …

PS: Or, when the client’s model calls forward (), it does not make a classifier. Before the classifier is called, the data is sent back and the server model completes the classifier. Can the main calculation amount in the forward process be shared Arrived

IIUC, what you have in mind is the reverse structure of this tutorial. In the tutorial, there are multiple observers sending inputs to the same agent, while in your case, you would like to have the server sending inputs to different clients and run forward on the clients?

The problem with the server-client structure is that, if forward is run on client, the autograd graph and activations will also be on client, meaning that the server cannot merge the output and run the backward locally.

One possible alternative is that, instead of sending forward output from client to server, you can let the client finish forward-backward and then send the model param gradients to server. Then the server collects gradients from all clients and sum them, use the summed grads to update parameters, and then broadcast the updated model to all clients.

Another alternative is to let the client finish forward-backward-optimizer, and then send the model params to the server. Then the server calculates the weighted average of all params and broadcast them back to the clients.