Linear layer using GPU

My network has two layers; the first one is a cnn layer and the second is a linear layer. Also, I try to use gpu for running it. The linear layer is as following:

self.fc1 = nn.Linear(rows_num_after_convolution, 1).to(torch.device("cuda:0"), dtype=torch.half, non_blocking=True)

but I receive this error:

Traceback (most recent call last):
  File "", line 576, in <module>
    outputs = net(current_image)
  File "/home/zahra/.local/lib/python2.7/site-packages/torch/nn/modules/", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "", line 281, in forward
    x = torch.atan(self.fc1(self.convZ.output_after_pooling.squeeze(0).squeeze(1)))
  File "/home/zahra/.local/lib/python2.7/site-packages/torch/nn/modules/", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zahra/.local/lib/python2.7/site-packages/torch/nn/modules/", line 92, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/zahra/.local/lib/python2.7/site-packages/torch/nn/", line 1408, in linear
    output = input.matmul(weight.t())
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mat2'

I would appreciate it if anyone could guide me.

Many thanks before all

@zahra Can you share how you are using the input.

model = CNNModel()
input = torch.ones((1,4), device='cuda', dtype=torch.half)
output = model(input)

This works perfectly fine for me with your Model.

1 Like

Thanks, you are right, I forgot to set the device for input as ‘cuda’.
Now, I have another problem that I appreciate it if you guide me about it, too:
The backward function of my customized cnn is sth like this:

def backward(ctx, grad_output):
        grad_output = grad_output.detach()
        input, filter, bias = ctx.saved_tensors     
        grad_bias = torch.sum(grad_output)

and in main function is like this:


I receive this error:

Traceback (most recent call last):
  File "", line 585, in <module>

>     loss.backward(None)

File "/home/zahra/.local/lib/python2.7/site-packages/torch/", line 107, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/zahra/.local/lib/python2.7/site-packages/torch/autograd/", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag

> RuntimeError: expected Variable or None (got numpy.ndarray)

I do not know what the problem is. Do you have any idea?
It is interesting that the code was executable on cpu while I receive this error in gpu!

Many thanks before all

Can you post the entire Class? You can follow

Thanks, my problem solved.
I used ScipyConv2dFunction class on this page:

I customized correlate2d and convovle2d by myself. After that, each run for each sample of data lasts about 15 seconds on CPU. For decreasing the time, I switched to GPU and I changed numpy to tensor. Unfortunately, the time increased to 3 mins.
Do you have any idea about how to improve the time?

Many thanks before all

@zahra Not sure how you have customised the correlate2d and convolve2d, but this is most likely due to copying back and forth, the tensors between device and host which is a slow process. Difficult to say more without looking at the code.