hello every one
there is a quastion when i use torch to optimize my simple model
it works when they are in cpu(no .cuda())
when it moves to cuda and it fails to compute
i hope someone help me to solve this

N, D_in, H, D_out = 64, 1000, 100, 10

X = torch.randn(N,D_in).cuda()
Y = torch.randn(N,D_out).cuda()

learning_rate = 1e-6

for t in range(500):
#forward propagation
h = X.mm(W1) #N * H
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(W2) #N * D_out

``````#loss function
loss = (y_pred-Y).pow(2).sum()
#print('y',Y)
#print('y pred',y_pred)
print(loss.item())
#backword propagation
loss.backward()

#update weights
``````

===========================================

26 W1 -= learning_rate * W1.grad
27 W2 -= learning_rate * W2.grad

TypeError: unsupported operand type(s) for *: ‘float’ and ‘NoneType’

Hi,

When you do ` W1 = torch.randn(D_in,H,requires_grad=True).cuda()` what is returned and stored as `W1` is not a leaf anymore: it is the result of the differentiable op `.cuda()`.
You should do `W1 = torch.randn(D_in,H, device="cuda", requires_grad=True)` to make sure `W1` is a leaf and thus will get a .grad field.

God ! it really works and thanks for your flash reply .very thankful
best wishes

hey guys would you mind telling me some detail about the " differentiable op .cuda()` ." or some links helping understanding to find out the principle?
that will be grateful

It is the same as if you’re doing:

``````a = torch.rand(10, requires_grad=True)
b = a + 1

b.sum().backward()
``````

Then `b.grad` will be None.
The same thing happens if you replace the `+ 1` op by `.cuda()`.
It is handled like any other differentiable operation on Tensor.