I’m following along What is torch.nn really? tutorial. There’s a setup part in the beginning where they define:
weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
I thought I might as well just rewrite that as:
weights = torch.randn(784, 10, requires_grad=True) / math.sqrt(784)
assert weights.requires_grad == True
Everything that followed was fine until this loop:
lr = 0.5 # learning rate
epochs = 2 # how many epochs to train for
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
where I got:
TypeError Traceback (most recent call last)
<ipython-input-55-24454d388081> in <cell line: 0>()
21 with torch.no_grad():
22 print((weights.requires_grad, bias.requires_grad))
---> 23 weights -= weights.grad * lr
24 bias -= bias.grad * lr
25 weights.grad.zero_()
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'
Turns out that when I use the original definition:
weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
this error is avoided, but I can’t figure out why? I know it’s got something to do with this division by sqrt of 784 because this also works:
weights = torch.randn(784, 10, requires_grad=True) # No division
Thanks in advance for any feedback.