RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm

chaslie · April 7, 2020, 11:03am

Hi,

I have just recieved this error when trying to feed a sample into a CVAE. I’m not sure what it means.
The section which generates the error is below, with test_out=model.sample(test_input) seems to be line of interest.

class model(nn.Module):
    def __init__(self,lat_dim=50 ):
        super(model, self).__init__()
        self.Encoder=Encoder(lat_dim)
        self.Decoder=Decoder(lat_dim)
        self.cuda()
        self.lat_dim=lat_dim
    
    def sample(self, epsilon=None):
        if epsilon is None:
            epsilon = torch.randn(100, self.lat_dim)
            print("epsilon=", epsilon)
        return self.decode(epsilon, apply_sigmoid=True)

def gen_example(model,epoch,test_input):
    x_axis=np.arange(1,1024,1)
    test_out=model.cuda()
    test_out=model.sample(test_input) ## this is the bit it is failing on. 
    fig=plt.figure(figsize(5,5))
    for i in range(test_out.shape[0]):
        plt.subplot(5, 5, i+1)
        plt.plot(test_out[i, :, 0], x_axis)
    plt.show()

The testing loop is initiated here:

    if epoch % 1 == 0:
        for i , (test_x, target) in enumerate( test_loader2):
            test_x=test_x.unsqueeze(1)
            test_x=test_x.cuda()
            loss,out_t,logqz_x,logpz,logpx_z,z2=loss_fn(model,test_x)
            loss=torch.mean(loss)
        elbo=-loss
        print("epoch=",epoch,"elbo=",elbo)
        gen_example(model,epoch,random_vector_for_gen)

regards,

Chaslie

ptrblck · April 8, 2020, 7:40am

In your sample function you are creating a new tensor on the CPU:

epsilon = torch.randn(100, self.lat_dim)

while the submodules seem to be on the GPU.
You could push this tensor to the appropriate device by checking the .device attribute of a known parameter of e.g. self.Decoder:

epsilon = torch.randn(100, self.lat_dim, device=self.Decoder.layer.param.device)

Let me know, if that helps or if another line of code is raising this issue.

chaslie · April 8, 2020, 8:28am

hi Ptrblck,

after much head scratching, i think the following seems to have solved it. However i am intrigued on the difference between this and your solution, which would be more stable?

test_input=test_input.cuda()

Chaslie

ptrblck · April 8, 2020, 8:30am

Both approaches don’t seem to target the same issue and my approach was just a guess.
If your code works, then you’ve found and got rid of the error.

chaslie · April 8, 2020, 8:36am

Hi Ptrblk,

thanks for the help, I will need to buy you a beer one day

I’ll let you know when i get the next problem debugged and it runs through

chaslie

ptrblck · April 8, 2020, 8:37am

Haha, I didn’t do anything and you figured it out, but we can surely have a beer someday.

mathematics · September 28, 2020, 12:07pm

sometimes problem got same known error. But i’ve implemented all right
I was implementing siamese network and here is my model

class Siamese(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super().__init__()
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.hidden_dim = hidden_dim
        
        self.fc = nn.Sequential(
            nn.Linear(self.hidden_dim, self.hidden_dim)
        )
        
    def forward(self, x1, x2):
        #n1
        lstm_out_1, _ = self.build_net(x1)
        lstm_out_1 = lstm_out_1.contiguous().view(-1, self.hidden_dim)
        l1 = self.fc(lstm_out_1)
        #n2
        lstm_out_2, _ = self.build_net(x2)
        lstm_out_2 = lstm_out_2.contiguous().view(-1, self.hidden_dim)
        l2 = self.fc(lstm_out_2)
        
        return f.normalize(l1), f.normalize(l2)
    
    def build_net(self, x):
        net = self.build_model(self.vocab_size,self.embed_dim, self.hidden_dim)(x)
        return net
        
    def build_model(self, vocab_size, embed_dim, hidden_dim):
        layers = []
        layers.append(nn.Embedding(vocab_size, embed_dim))
        layers.append(nn.LSTM(embed_dim, hidden_dim))
        return  nn.Sequential(*layers)

and i was experimenting if it will work for random values and did like this
after taking all to cuda device = torch.device('cuda')

build a random tensor asinput = torch.LongTensor([[2,4,12,13]]).to(device) and passed to model as
Siamese(200,128, 32).to(device).forward(input,input)

got same

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

but when removing all cuda things it works like perfectly

input = torch.LongTensor([[2,4,12,13]])
Siamese(200,128, 128).forward(input,input)

also please dont mine but my yesterday normalize function, couldnt be passed giving error LongTensor not supported for torch.div so i sticked to standard f.normalize to do operations. that Embedding dim not support for FloatTensor if passed.

mathematics · September 28, 2020, 12:44pm

and suddenly solved , Although it looks no mistake but build_model function was caused,
I had manually defined that function in __init__ and this is solved

class Siamese(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super().__init__()
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.hidden_dim = hidden_dim
        self.fc = nn.Sequential(
            nn.Linear(self.hidden_dim, self.hidden_dim)
        )
        
        self.em = nn.Sequential(
        nn.Embedding(self.vocab_size, self.embed_dim),
        nn.LSTM(self.embed_dim, self.hidden_dim)
        )
        
    def forward(self, x1, x2):
        #n1
        lstm_out_1, _ = self.build_net(x1)
        print(lstm_out_1)
        lstm_out_1 = lstm_out_1.contiguous().view(-1, self.hidden_dim)
        l1 = self.fc(lstm_out_1)
        #n2
        lstm_out_2, _ = self.build_net(x2)
        lstm_out_2 = lstm_out_2.contiguous().view(-1, self.hidden_dim)
        l2 = self.fc(lstm_out_2)
        
        return f.normalize(l1), f.normalize(l2)
    
    def build_net(self, x):
        net = self.em(x)
        return net

but Why it was not working on previous code??

ptrblck · September 28, 2020, 8:13pm

If you are creating modules in the forward method, they were not properly registered in the model creation and thus never pushed to the device via model.cuda() or model.to().
Besides that, the parameters of these modules are also most likely not passed to the optimizer and recreated using random values in each iteration.

In your last code snippet you are creating the additional layers in the __init__, which is the right approach to register them properly.

mathematics · September 29, 2020, 1:27am

Yes thanks, and previous question was also solved, need not to pass LongTensor because output of model is FloatTensor, which worked in custom normalize function, silly i was doing