Hey there,
Whilst implementing a simple MNIST digit classifier, I’ve got stuck on a bug where grad
seems to be set to None
after I call loss.backward()
. Any ideas how I get this not to be None? What am I missing?
Here’s the error I get:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-22a0da261727> in <module>
23
24 with torch.no_grad():
---> 25 weights -= weights.grad * LR
26 bias -= bias * LR
27
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'
If I run the code again, I get a different error, namely:
RuntimeError Traceback (most recent call last)
<ipython-input-25-455a55143419> in <module>
7 predictions = xb@weights + bias
8 loss = get_loss(predictions, yb)
----> 9 loss.backward()
10
11 with torch.no_grad():
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
116 products. Defaults to ``False``.
117 """
--> 118 torch.autograd.backward(self, gradient, retain_graph, create_graph)
119
120 def register_hook(self, hook):
/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 Variable._execution_engine.run_backward(
92 tensors, grad_tensors, retain_graph, create_graph,
---> 93 allow_unreachable=True) # allow_unreachable flag
94
95
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
And here’s what I think are the relevant parts of my code:
# Skipped DataLoader setup for brevity
def get_accuracy(precictions, actual):
return (precictions >= 0.5).float() == actual
def get_loss(predictions, actual):
normalised = predictions.sigmoid()
return torch.where(actual == IS_7, 1 - normalised, normalised).mean()
def init_params(size, variance=1.0):
return torch.randn(size, dtype=torch.float, requires_grad=True) * variance
weights = init_params((IMG_SIZE, 1))
bias = init_params(1)
for epoch in range(1):
# Iterate over dataset batches
# xb is a tensor with the independent variables for the batch (tensor of pixel values)
# yb "" dependent "" (which digit it is)
for xb, yb in dl:
print(xb.shape)
predictions = xb@weights + bias
loss = get_loss(predictions, yb)
loss.backward()
with torch.no_grad():
weights -= weights.grad * LR # <-- Error here: unsupported operand type(s) for *: 'NoneType' and 'float'
bias -= bias * LR
weights.grad.zero_()
bias.grad.zero_()
Some useful notes:
- I also tried to use
.data
instead ofwith torch.no_grad()
but that didn’t help.with
seems to be the preferred method from PyTorch (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html) - Calling
@
for matrix multiplication in the predictions vstorch.mm
makes no difference. - I previously made a mistake with my tensor setup but I think that’s all fixed now.
weights.shape
,bias.shape
outputs(torch.Size([784, 1]), torch.Size([1]))