Embedding on GPU

lorenzo_fabbri · July 25, 2019, 7:51am

I’m building a LSTM model and I want to use as input the result of nn.Embedding().
I transfer my model to GPU using to('cuda'), but when I train it, Torch complains saying that the embedding is on the CPU while the input tensor is on the GPU.

To test whether there’s something wrong with my model, I simply tried to embed a tensor using the following code:

emb = nn.Embedding(5, 11)
t = torch.tensor([1,2,3])

If I now do emb(t), it works properly. But if I do t = t.cuda() and try again, it says:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'

The all thing works if I put the embedding also on the GPU.
A few questions then:

According to this, I should not put the embedding tensor on the GPU since it might be big. But how do I embed it then? To be fair: that refers to PyTorch 0.4, while I’m using PyTorch 1.1: did it change in the meanwhile?
If my embedding layer is inside the Class defining my model, and I put the model on the GPU, shouldn’t it work properly?

ptrblck · July 25, 2019, 10:13pm

I’m not seeing the advice of leaving the embedding on the CPU in the linked issue. However, if you would like to do it, you could just call model.embedding.cpu(), where .embedding refers to the attribute name of your nn.Embedding layer. In the forward method you would then have to pass a CPU tensor to the embedding, push the output to the GPU and pass it to the next layer (which should have its parameters on the GPU).
Yes, that should work. Could you post your model definition and a code snippet which reproduces this error, so that we can have a look?

saba · November 10, 2020, 5:11am

Hi Ptrblck,

I pass the variable from embedding to another layers in this way. It did not gives me any error. Is embedding layer works correct? Points refer to “Out1.cpu()” and "Out3.cuda()

class Discriminator4layer113D(nn.Module):
    def __init__(self, ngpu,ndf):
        super(Discriminator4layer113D, self).__init__()

        ## --define embedding for 64 differente labels and map them to dim of 10

        self.embedding=nn.Embedding(401, 10)

        self.ngpu = ngpu
        self.ndf=ndf
        self.l1= nn.Sequential(nn.Conv3d(2, self.ndf, 3, 1, 0, bias=False),nn.LeakyReLU(0.2, inplace=True))
        self.l2=nn.Sequential(nn.Conv3d(self.ndf, self.ndf * 2, 3, 1, 0, bias=False),nn.BatchNorm3d(ndf * 2),nn.LeakyReLU(0.2, inplace=True))
        self.drop_out2 = nn.Dropout(0.5)
        self.l3= nn.Sequential(nn.Conv3d(self.ndf * 2, self.ndf * 4, 3, 2, 0, bias=False), nn.BatchNorm3d(ndf * 4), nn.LeakyReLU(0.2, inplace=True))
        self.drop_out3 = nn.Dropout(0.5)

        self.l4= nn.Sequential(nn.Conv3d(self.ndf * 4, 1, 3, 1, 0, bias=False),nn.Sigmoid())


    def forward(self, x,Labels):

         Labels=Labels.squeeze(1).squeeze(1).squeeze(1)
         Out1=self.embedding(Labels)
         ## apply linear layer to convert the size of embdded number to the input size
         Out2= nn.Linear(10, x.shape[2]*x.shape[3]*x.shape[4])(Out1.cpu())

         ## ---- reshape the label size to the size of input for concatenation
         Out3=Out2.view(-1,11,11,11).unsqueeze(1)

         ## ---- concatenate labels and inputs  
         Out4=torch.cat((x,Out3.cuda()),1)

         out = self.l1(Out4)
         out=self.l2(out)
         out=self.drop_out2(out)
         out=self.l3(out)
         out=self.drop_out3(out)
         out=self.l4(out)

         return out

ptrblck · November 10, 2020, 6:28am

You are recreating the nn.Linear layer in each forward pass with random parameters, so that it won’t be trained.
Create the layer in the __init__ method in the same way other layers were initialized and use it in the forward method.

I would recommend to check the input and output shape of the embedding layer to make sure it’s working as expected.

saba · November 11, 2020, 7:30am

I changed in this way. The output of the embedding is 64x10 which it is correct since my batch size is 64 and output of the embedding is vector by 10 dimension.

class Discriminator(nn.Module):
    def __init__(self, ngpu,ndf):
        super(Discriminator, self).__init__()

        ## --define embedding for 64 differente labels and map them to dim of 10

        self.embedding=nn.Embedding(401, 10)
        self.ngpu = ngpu
        self.ndf=ndf
     
        self.l=nn.Linear(10,1331)
        self.l1= nn.Sequential(nn.Conv3d(2, self.ndf, 3, 1, 0, bias=False),nn.LeakyReLU(0.2, inplace=True))
        self.l2=nn.Sequential(nn.Conv3d(self.ndf, self.ndf * 2, 3, 1, 0, bias=False),nn.BatchNorm3d(ndf * 2),nn.LeakyReLU(0.2, inplace=True))
        self.drop_out2 = nn.Dropout(0.5)
        self.l3= nn.Sequential(nn.Conv3d(self.ndf * 2, self.ndf * 4, 3, 2, 0, bias=False), nn.BatchNorm3d(ndf * 4), nn.LeakyReLU(0.2, inplace=True))
        self.drop_out3 = nn.Dropout(0.5)
        self.l4= nn.Sequential(nn.Conv3d(self.ndf * 4, 1, 3, 1, 0, bias=False),nn.Sigmoid())

    def forward(self, x,Labels):

         Labels=Labels.squeeze(1).squeeze(1).squeeze(1)
         Out1=self.embedding(Labels)

         Out2= self.l(Out1)

         ## ---- reshape the label size to the size of input for concatenation
         Out3=Out2.view(-1,11,11,11).unsqueeze(1)

         ## ---- concatenate labels and inputs  
         Out4=torch.cat((x,Out3),1)

         out = self.l1(Out4)
         out=self.l2(out)
         out=self.drop_out2(out)
         out=self.l3(out)
         out=self.drop_out3(out)
         out=self.l4(out)

         return out
                       
class Generator(nn.Module):

    def __init__(self,ngpu,nz,ngf):
        super(Generator, self).__init__()
        self.ngpu=ngpu
        self.nz=nz
        self.ngf=ngf
        self.embedding=nn.Embedding(401, 10)

        self.l1= nn.Sequential( nn.ConvTranspose3d(self.nz+10, self.ngf * 8, 3, 1, 0, bias=False),
        nn.BatchNorm3d(self.ngf * 8),
        nn.ReLU(True))

        self.l2= nn.Sequential(nn.ConvTranspose3d(self.ngf * 8, self.ngf * 4, 3, 1, 0, bias=False),
        nn.BatchNorm3d(self.ngf * 4),
        nn.ReLU(True))

        self.l3= nn.Sequential(nn.ConvTranspose3d( self.ngf * 4, self.ngf * 2, 3, 1, 0, bias=False),
        nn.BatchNorm3d(self.ngf * 2),
        nn.ReLU(True))

        self.l4= nn.Sequential(nn.ConvTranspose3d( self.ngf*2, 1, 3, 1, 0, bias=False),nn.Sigmoid())

    def forward(self, input,Labels,Sigmad):

        Labels=Labels.squeeze(1).squeeze(1).squeeze(1)

        Out1=self.embedding(Labels)
        ## ---- concatenate labels and noise from channels
        Out1=Out1.unsqueeze(2).unsqueeze(3).unsqueeze(4)
        Out2=torch.cat((Out1,input),1)
        out=self.l1(Out2)
        out=self.l2(out)
        out=self.l3(out)
        out=self.l4(out)*Sigmad

        return out