Large model on mutiple GPU Issue

Hello all

I have a question about Fitting one large model (costing around 18G memeory) into multiple GPU, like 4 TITANX 12G;

Currently, I am not setting anything. The model itself contains one inception_v3 module, stacked convlstm and fc classfier. The pysudo code is like below:

Class Network(nn.Module):
    def __init__(self):
         self.enc = inception_v3().cuda()
         self.convlstm = ConvLSTM().cuda()
         self.discriminator = fully_connect_layer().cuda()

    def forward(self, input):
        x = self.enc(input)
        x = self.convlstm(x)
        x = self.discriminator(x)

        return x

The error I have is:
Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2;

It seems like pytoch would automatically allocate tensor that exceeds limit for gpu #1 to another gpus. However, recognize tensor from different gpu is an issue ? Please someone let me know how to deal with this.

You could use something like that:

class Network(nn.Module):
    def __init__(self, inception_device, lstm_device,  discr_device):
         self.enc = inception_v3().to(inception_device)
         self.convlstm = ConvLSTM().to(lstm_device) 
         self.discriminator = fully_connect_layer().to(discr_device)
        self.discr_device = discr_device
        self.lstm_device = lstm_device 
        self.inception_device = inception_device 

    def forward(self, input):
        x = self.enc(input.to(self.inception_device))
        x = self.convlstm(x.to(self.lstm_device))
        x = self.discriminator(x.to(self.discr_device))
        return x

This would make your training a bit slower, since pytorch needs to push the tensor to the new devices at each iteration.

If the tensor is already on the correct device, the .to() becomes a no-OP.

To instantiate the netwo, you would do something like

net = Network(
             torch.device("cuda:0"),
             torch.device("cuda:1"),
             torch.device("cuda:2")
)

The integers correspond to the GPU indices. This would make the code somewhat device agnostic as you could either use other indices, or create a mixture of GPU and CPU network.

1 Like

Thanks a lot. It works after I modify the code to your suggestion.

However, correct me if I am wrong, I think pytorch should come up with a clever way to for this gpu memeory control problem, for the good of this platform.

I don’t agree with you on this, because it would mean, that pytorch has to know the expected behavior. This would violate pytorch’s concept of allowing the user to create highly flexible networks. One could enforce the input of a layer to be pushed to the correct device, but that would hide other (potentially more critical) errors, since usually wrong devices are an indicator for some wrong graph definition.

Hmm…I still can not equal the concept of ‘flexible network’ with ‘memeory efficient’.

Thanks again.

Pytorch is memory efficient, but it does not handle the devices for you per default. This is done on purpose and allows the user to use the same code on different devices and specifying the devices to use.
The feature, you requested is more likely implemented in a high-level wrapper (although I must admit, that I do not know whether any of the wrappers implemented this actual feature.