Assigning every instance of siamese network to separate GPU

Tzeviya · August 22, 2019, 12:10pm

I have a Siamese Network with a triplet loss function at the end. E.g.:

> class Siamese(nn.Module):
>     def __init__(self, ae_net):
>         super(Siamese, self).__init__()
>         self.ae_net = ae_net
> 
>     def forward(self, x1, x2, x3, hidden):
>         
>         a = self.ae_net(x1, hidden)
>         b = self.ae_net(x2, hidden)
>         c = self.ae_net(x3, hidden)
>         return a, b, c

The network that repeats itself (i.e., self.ae_net) is an LSTM with inputs of varying lengths, so I’m not sure I can use nn.DataParallel. I was wondering if there was a way to assign the implementation of every instance of self.ae_net to a different GPU, so that they would be calculated in parallel.

Thanks

albanD · August 22, 2019, 6:24pm

Hi,

In your example, you use the hidden state sequentially for each network one after the other. Is that what you want? Because if you want that, then you cannot really run them in parallel on different GPUs as they will need to wait for the previous one to finish.

Tzeviya · August 23, 2019, 1:06pm

Hi, thanks for your answer! The hidden state that is used for a, b and c is the same one, i.e., it’s just an initialized tensor used three times separately (isn’t it?), so I don’t think it’s a problem to use them in parallel.

albanD · August 26, 2019, 2:56pm

Ho right it is, my bad.

It is a bit tricky for Siamese network as you need to accumulate the gradients for all tree runs.
One simple way to do this is to make three copies of your network, one on each device, then send each copy to it’s respective device.
After each backward, you will need to accumulate the gradients by hand then share the new values by hand.
This is going to be tricky to do very efficiently, and you might not get a large improvement for using multiple gpus because of the synchronisation needed between devices.