Autograd does not appear to be setting gradient: TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

mulholio · November 22, 2020, 11:09pm

Hey there,

Whilst implementing a simple MNIST digit classifier, I’ve got stuck on a bug where grad seems to be set to None after I call loss.backward(). Any ideas how I get this not to be None? What am I missing?

Here’s the error I get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-22a0da261727> in <module>
     23 
     24         with torch.no_grad():
---> 25             weights -= weights.grad * LR
     26             bias -= bias * LR
     27 

TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

If I run the code again, I get a different error, namely:

RuntimeError                              Traceback (most recent call last)
<ipython-input-25-455a55143419> in <module>
      7         predictions = xb@weights + bias
      8         loss = get_loss(predictions, yb)
----> 9         loss.backward()
     10 
     11         with torch.no_grad():

/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    116                 products. Defaults to ``False``.
    117         """
--> 118         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    119 
    120     def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

And here’s what I think are the relevant parts of my code:

# Skipped DataLoader setup for brevity

def get_accuracy(precictions, actual):
    return (precictions >= 0.5).float() == actual

def get_loss(predictions, actual):
    normalised = predictions.sigmoid()
    return torch.where(actual == IS_7, 1 - normalised, normalised).mean()

def init_params(size, variance=1.0):
    return torch.randn(size, dtype=torch.float, requires_grad=True) * variance

weights = init_params((IMG_SIZE, 1))
bias = init_params(1)

for epoch in range(1):
    #  Iterate over dataset batches
    # xb is a tensor with the independent variables for the batch (tensor of pixel values)
    # yb         ""           dependent             ""            (which digit it is)
    for xb, yb in dl:
        print(xb.shape)
        predictions = xb@weights + bias
        loss = get_loss(predictions, yb)
        loss.backward()

        with torch.no_grad():
            weights -= weights.grad * LR # <-- Error here: unsupported operand type(s) for *: 'NoneType' and 'float'
            bias -= bias * LR
        
            weights.grad.zero_()
            bias.grad.zero_()

Some useful notes:

I also tried to use .data instead of with torch.no_grad() but that didn’t help. with seems to be the preferred method from PyTorch (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)
Calling @ for matrix multiplication in the predictions vs torch.mm makes no difference.
I previously made a mistake with my tensor setup but I think that’s all fixed now. weights.shape, bias.shape outputs (torch.Size([784, 1]), torch.Size([1]))

mulholio · November 23, 2020, 8:03am

Found the fix. The tensor returned by init_params needs to be wrapped in .requires_grad_(), not the individual tensors within it.

def init_params(size, variance=1.0):
    return (torch.randn(size, dtype=torch.float)*variance).requires_grad_()