Autograd fails without giving any warning while doing matrix operations

I am trying to learn a parameter which is one element of a bigger matrix. The loss function does not directly use the learnable parameter, but a matrix having this parameter.

Simplified code snippet is given below to reproduce the error

import torch as t
import torch
from matplotlib import pyplot as plt

param = t.tensor([1], dtype=t.float, requires_grad=True)

a = torch.tensor([[param, 0],[0, 2]], dtype=t.float, requires_grad= True)
b = torch.tensor([[5], [6]], dtype=t.float, requires_grad = True)
c = torch.tensor([[50], [12]], dtype=t.float, requires_grad = True)

def calc_loss(A, B, C):
    X =  torch.sqrt(torch.mean((torch.matmul(A,B)-C)**2))
    return t.abs(X)

optimizer = t.optim.Adam([param], lr=1)
n = 10

values = t.zeros(n, dtype=t.float)
for i in range(n):
    optimizer.zero_grad()
    loss = calc_loss(a, b, c)
    print(f"gradient of the complex number are {param.grad}")
    print(f"calculated loss value is {loss}")
    loss.backward()
    optimizer.step()
    values[i] = param.detach()

# Plot the results
plt.plot(values, label='Learnable parameter')
plt.legend()
plt.xlabel('Iteration')
plt.ylabel('Parameter')
plt.show()

gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672

pytrochissue

The expected value for param is 10, but seems the gradient is always None. Somewhere the computational graph is brocken, but I am not able to track it.

Using a parameter to create a tensor just uses its value, does not propagate the gradient.
Consider using advanced indexing, for example, by doing

a = torch.zeros(2, 2)
a[0, 0] = param

If I use a = torch.zeros(2, 2) It gives other errors

But with
a = torch.tensor([[0, 0],[0, 2]], dtype=t.float, requires_grad= True)

It results in the following error. I tried param.clone as well, that also results in same error

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Tried it with

a = torch.zeros(2, 2)
a[0,0] = param
a[1,1] = 2 

then I get the following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

I tried with retain_graph = True, but then the solution doesnt converge to the expected value of 10.

@ptrblck please help :sweat:

Hi Rishi!

You haven’t posted the complete code that leads to your error. Your
code fragment (which is Lezcano’s solution) should work for your use
case. Here is a complete example:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> param = torch.ones (1, requires_grad = True)
>>> a = torch.zeros (2, 2)
>>> a[0, 0] = param
>>> a[1, 1] = 2
>>> a
tensor([[1., 0.],
        [0., 2.]], grad_fn=<CopySlices>)
>>> param.grad
>>> a.sum().backward()
>>> param.grad
tensor([1.])

Best.

K. Frank

Rest of the code remains the same, that is why I didn’t post it. I tried your version and getting the same error.

import torch as t
import torch
from matplotlib import pyplot as plt

param = torch.ones (1, requires_grad = True)
a = torch.zeros (2, 2)
a[0, 0] = param
a[1, 1] = 2

b = torch.tensor([[5], [6]], dtype=t.float, requires_grad = True)
c = torch.tensor([[50], [12]], dtype=t.float, requires_grad = True)

def calc_loss(A, B, C):
    X =  torch.sqrt(torch.mean((torch.matmul(A,B)-C)**2))
    return t.abs(X)

optimizer = t.optim.Adam([param], lr=1)
n = 10

values = t.zeros(n, dtype=t.float)
for i in range(n):
    optimizer.zero_grad()
    loss = calc_loss(a, b, c)
    print(f"gradient of the complex number are {param.grad}")
    print(f"calculated loss value is {loss}")
    loss.backward()
    optimizer.step()
    values[i] = param.detach()

# Plot the results
plt.plot(values, label='Learnable parameter')
plt.legend()
plt.xlabel('Iteration')
plt.ylabel('Parameter')
plt.show()

Error message is

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Hi Rishi!

The problem is that you have to rebuild the “computation graph”
before you call .backward() for a second time. Both indexing
into a and setting that “value” to param count as part of the
computation graph.

Consider:

>>> import torch
>>> print (torch.__version__)
1.9.0
>>>
>>> param = torch.ones (1, requires_grad = True)
>>>
>>> a = torch.zeros (2, 2)
>>> a[0,0] = param
>>> a[1,1] = 2
>>>
>>> a.sum().backward()
>>>
>>> a[0,0] = param
>>> a.sum().backward()   # fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<path_to_pytorch_install>\torch\_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "<path_to_pytorch_install>\torch\autograd\__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
>>>
>>> b = torch.zeros (2, 2)
>>> b[1,1] = 2
>>>
>>> a = b.clone()
>>> a[0, 0] = param
>>> a.sum().backward()
>>>
>>> a = b.clone()
>>> a[0, 0] = param
>>> a.sum().backward()   # works

Applying this in a tweaked version of your code:

>>> import torch as t
>>> import torch
>>>
>>> param = torch.ones (1, requires_grad = True)
>>> a = torch.zeros (2, 2)
>>> # a[0, 0] = param   # we will do this with a_clone
... a[1, 1] = 2
>>>
>>> b = torch.tensor([[5], [6]], dtype=t.float, requires_grad = True)
>>> c = torch.tensor([[50], [12]], dtype=t.float, requires_grad = True)
>>>
>>> def calc_loss(A, B, C):
...     X =  torch.sqrt(torch.mean((torch.matmul(A,B)-C)**2))
...     return t.abs(X)
...
>>> optimizer = t.optim.Adam([param], lr=1)
>>> n = 10
>>>
>>> values = t.zeros(n, dtype=t.float)
>>> for i in range(n):
...     optimizer.zero_grad()
...     a_clone = a.clone()     # re-clone a in forward pass
...     a_clone[0, 0] = param   # re-"copy-slices" in forward pass
...     # loss = calc_loss(a, b, c)
...     loss = calc_loss (a_clone, b, c)  # use cloned copy in forward pass
...     print(f"calculated loss value is {loss}")
...     loss.backward()
...     print(f"gradient of the complex number are {param.grad}")   # move to after .backward()
...     optimizer.step()
...     values[i] = param.detach()
...
calculated loss value is 31.819805145263672
gradient of the complex number are tensor([-3.5355])
calculated loss value is 28.284271240234375
gradient of the complex number are tensor([-3.5355])
calculated loss value is 24.748737335205078
gradient of the complex number are tensor([-3.5355])
calculated loss value is 21.21320343017578
gradient of the complex number are tensor([-3.5355])
calculated loss value is 17.677669525146484
gradient of the complex number are tensor([-3.5355])
calculated loss value is 14.142135620117188
gradient of the complex number are tensor([-3.5355])
calculated loss value is 10.60660171508789
gradient of the complex number are tensor([-3.5355])
calculated loss value is 7.071067810058594
gradient of the complex number are tensor([-3.5355])
calculated loss value is 3.535533905029297
gradient of the complex number are tensor([-3.5355])
calculated loss value is 0.0
gradient of the complex number are tensor([nan])

Best.

K. Frank

Thanks Frank ,

But it seems the code still have some issue, the gradient is constant over the epochs and becomes Nan afterwards.
The output you posted also have same gradient for all the epochs
gradient of the number are tensor([-3.5355])

Hi @KFrank @Lezcano,
Do you think its worth trying to write my own gradient for this matrix assembly using from torch.autograd import Function

Hi Rishi!

Calculate analytically the derivative of loss with respect to param,
specifically for the values of a, b, and c that you are using. (Hint:
You will discover that it is constant.)

In essence, you are having pytorch calculate sqrt (x**2).
sqrt (x) is singular at x = 0, and its derivative diverges (and
pytorch – rightly – doesn’t replace sqrt (x**2) with x).

Consider:

>>> x = torch.tensor ([0.0], requires_grad = True)
>>> torch.sqrt (x).backward()
>>> x.grad
tensor([inf])
>>> x = torch.tensor ([0.0], requires_grad = True)
>>> torch.sqrt (x**2).backward()
>>> x.grad
tensor([nan])

Best.

K. Frank

1 Like

Thanks Frank.

could work with this solution :slight_smile: