Not on cuda error

smu226 · March 25, 2019, 4:55pm

Hello! I have this line of code:
error_threshold = rmse_loss(torch.log(model(factors)), product)
where model is trained to predict exp(product) given factors. model, factors and product are all on CUDA. However, when I run the code I get this error:
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #3 'other'
Can someone tell me what this mean? What is argument number 3? The definition of rmse_loss is this:

def rmse_loss(pred, targ):
    denom = targ**2
    denom = torch.sqrt(denom.sum()/len(denom))
    return torch.sqrt(F.mse_loss(pred, targ))/denom

The NN used for the model is this:

class SimpleNet(nn.Module):
    def __init__(self, ni):
        super().__init__()
        self.linear1 = nn.Linear(ni, 128)
        self.linear2 = nn.Linear(128, 128)
        self.linear3 = nn.Linear(128, 128)
        self.linear4 = nn.Linear(128, 64)
        self.linear5 = nn.Linear(64,64)
        self.linear6 = nn.Linear(64,64)
        self.linear7 = nn.Linear(64,1)

    def forward(self, x):
        x = F.softplus(self.linear1(x))
        x = F.softplus(self.linear2(x))
        x = F.softplus(self.linear3(x))
        x = F.softplus(self.linear4(x))
        x = F.softplus(self.linear5(x))
        x = F.softplus(self.linear6(x))
        x = self.linear7(x)
        return x

Thank you!

vmirly1 · March 25, 2019, 5:00pm

Based on the error message, I think the error is not coming from this rmse_loss but somewhere in the forward function of model is causing this error. The reason is the error message says argument #3, but this line has only 2 arguments.

Also, you can check the device attribute of both factors and product to make sure that they are in fact on CUDA device.

smu226 · March 25, 2019, 5:16pm

Thank you for reply. Yes, they are on CUDA. I also edited my post with the NN used for the model.

vmirly1 · March 25, 2019, 5:19pm

I think it’s this lin that causes the issue. Can you create a tensor for the len(denom) and moved that to CUDA, something like this ... / torch.tesnor(len(denom)).to(device)

smu226 · March 25, 2019, 5:19pm

I actually just noticed that if I do this instead:
error_threshold = rmse_loss(1/model(factors), product)
so replacing torch.log with 1/, the code works just fine.

vmirly1 · March 25, 2019, 5:24pm

It’s weird that it worked. I don’t think the problem is from torch.log., because if the input to torch.log is in CUDA, the output will also be in CUDA:

>>> x = torch.tensor([1.0, 2.0, 3.0])
>>> device = torch.device('cuda:0')
>>> device
device(type='cuda', index=0)
>>> x.device
device(type='cpu')
>>> x = x.to(device)
>>> x.device
device(type='cuda', index=0)
>>> torch.log(x)
tensor([0.0000, 0.6931, 1.0986], device='cuda:0')

smu226 · March 25, 2019, 5:25pm

I changed that to this:
denom = torch.sqrt(torch.mean(targ**2))
but I am still getting the error. Seems like that any transformation without torch. works fine (for example squared), but when I use torch. (i tried log, exp, cos and sin) I get the error.

smu226 · March 25, 2019, 5:28pm

I agree, but the error seems to be so some 3rd argument, not to the first one, where torch.log is applied.

vmirly1 · March 25, 2019, 5:36pm

Actually, the problem is not here at all. I tried the same function with two simple tensors, and I did not get any error:

>>> def rmse_loss(pred, targ):
...     denom = targ**2
...     denom = torch.sqrt(denom.sum()/len(denom))
...     return torch.sqrt(F.mse_loss(pred, targ))/denom
... 
 
>>> x = torch.tensor([1.0, 2.0, 3.0])
>>> x = x.to(device)

>>> y = torch.tensor([1.5, 2.2, 3.1])
>>> y = y.to(device)

>>> x.device
device(type='cuda', index=0)
>>> y.device
device(type='cuda', index=0)
>>> rmse_loss(x, y)
tensor(0.1340, device='cuda:0')

smu226 · March 25, 2019, 6:06pm

So is there something wrong with my model?

vmirly1 · March 25, 2019, 6:07pm

Can you just get the output from your model and see what device the output is on?

output = model(factors)
output.devive

smu226 · March 25, 2019, 6:13pm

The output is: cuda:0

vmirly1 · March 26, 2019, 1:09pm

I see. It doesn’t make sense. If we create two random tensors with the same shape as the output of the model and the shape target, and pass them to the rmse_loss, it has no problem.

I think you can debug it line by line, use print statement to display the device of each tensor after each line of computation. Then, maybe we can understand where this problem is happening.