# Autograd fails without giving any warning while doing matrix operations

I am trying to learn a parameter which is one element of a bigger matrix. The loss function does not directly use the learnable parameter, but a matrix having this parameter.

Simplified code snippet is given below to reproduce the error

``````import torch as t
import torch
from matplotlib import pyplot as plt

a = torch.tensor([[param, 0],[0, 2]], dtype=t.float, requires_grad= True)
b = torch.tensor([[5], [6]], dtype=t.float, requires_grad = True)
c = torch.tensor([[50], [12]], dtype=t.float, requires_grad = True)

def calc_loss(A, B, C):
X =  torch.sqrt(torch.mean((torch.matmul(A,B)-C)**2))
return t.abs(X)

n = 10

values = t.zeros(n, dtype=t.float)
for i in range(n):
loss = calc_loss(a, b, c)
print(f"calculated loss value is {loss}")
loss.backward()
optimizer.step()
values[i] = param.detach()

# Plot the results
plt.plot(values, label='Learnable parameter')
plt.legend()
plt.xlabel('Iteration')
plt.ylabel('Parameter')
plt.show()

``````

gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672
gradient of the complex number are None
calculated loss value is 31.819805145263672

The expected value for `param` is 10, but seems the gradient is always None. Somewhere the computational graph is brocken, but I am not able to track it.

Using a parameter to create a tensor just uses its value, does not propagate the gradient.
Consider using advanced indexing, for example, by doing

``````a = torch.zeros(2, 2)
a[0, 0] = param
``````

If I use `a = torch.zeros(2, 2)` It gives other errors

But with
`a = torch.tensor([[0, 0],[0, 2]], dtype=t.float, requires_grad= True)`

It results in the following error. I tried param.clone as well, that also results in same error

``RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.``

Tried it with

``````a = torch.zeros(2, 2)
a[0,0] = param
a[1,1] = 2
``````

then I get the following error

` RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.`

I tried with retain_graph = True, but then the solution doesnt converge to the expected value of 10.

Hi Rishi!

You havenâ€™t posted the complete code that leads to your error. Your
code fragment (which is Lezcanoâ€™s solution) should work for your use
case. Here is a complete example:

``````>>> import torch
>>> torch.__version__
'1.9.0'
>>> param = torch.ones (1, requires_grad = True)
>>> a = torch.zeros (2, 2)
>>> a[0, 0] = param
>>> a[1, 1] = 2
>>> a
tensor([[1., 0.],
>>> a.sum().backward()
tensor([1.])
``````

Best.

K. Frank

Rest of the code remains the same, that is why I didnâ€™t post it. I tried your version and getting the same error.

``````import torch as t
import torch
from matplotlib import pyplot as plt

param = torch.ones (1, requires_grad = True)
a = torch.zeros (2, 2)
a[0, 0] = param
a[1, 1] = 2

b = torch.tensor([[5], [6]], dtype=t.float, requires_grad = True)
c = torch.tensor([[50], [12]], dtype=t.float, requires_grad = True)

def calc_loss(A, B, C):
X =  torch.sqrt(torch.mean((torch.matmul(A,B)-C)**2))
return t.abs(X)

n = 10

values = t.zeros(n, dtype=t.float)
for i in range(n):
loss = calc_loss(a, b, c)
print(f"calculated loss value is {loss}")
loss.backward()
optimizer.step()
values[i] = param.detach()

# Plot the results
plt.plot(values, label='Learnable parameter')
plt.legend()
plt.xlabel('Iteration')
plt.ylabel('Parameter')
plt.show()
``````

Error message is

``RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.``

Hi Rishi!

The problem is that you have to rebuild the â€ścomputation graphâ€ť
before you call `.backward()` for a second time. Both indexing
into `a` and setting that â€śvalueâ€ť to `param` count as part of the
computation graph.

Consider:

``````>>> import torch
>>> print (torch.__version__)
1.9.0
>>>
>>> param = torch.ones (1, requires_grad = True)
>>>
>>> a = torch.zeros (2, 2)
>>> a[0,0] = param
>>> a[1,1] = 2
>>>
>>> a.sum().backward()
>>>
>>> a[0,0] = param
>>> a.sum().backward()   # fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<path_to_pytorch_install>\torch\_tensor.py", line 255, in backward
File "<path_to_pytorch_install>\torch\autograd\__init__.py", line 149, in backward
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
>>>
>>> b = torch.zeros (2, 2)
>>> b[1,1] = 2
>>>
>>> a = b.clone()
>>> a[0, 0] = param
>>> a.sum().backward()
>>>
>>> a = b.clone()
>>> a[0, 0] = param
>>> a.sum().backward()   # works
``````

Applying this in a tweaked version of your code:

``````>>> import torch as t
>>> import torch
>>>
>>> param = torch.ones (1, requires_grad = True)
>>> a = torch.zeros (2, 2)
>>> # a[0, 0] = param   # we will do this with a_clone
... a[1, 1] = 2
>>>
>>> b = torch.tensor([[5], [6]], dtype=t.float, requires_grad = True)
>>> c = torch.tensor([[50], [12]], dtype=t.float, requires_grad = True)
>>>
>>> def calc_loss(A, B, C):
...     X =  torch.sqrt(torch.mean((torch.matmul(A,B)-C)**2))
...     return t.abs(X)
...
>>> n = 10
>>>
>>> values = t.zeros(n, dtype=t.float)
>>> for i in range(n):
...     a_clone = a.clone()     # re-clone a in forward pass
...     a_clone[0, 0] = param   # re-"copy-slices" in forward pass
...     # loss = calc_loss(a, b, c)
...     loss = calc_loss (a_clone, b, c)  # use cloned copy in forward pass
...     print(f"calculated loss value is {loss}")
...     loss.backward()
...     print(f"gradient of the complex number are {param.grad}")   # move to after .backward()
...     optimizer.step()
...     values[i] = param.detach()
...
calculated loss value is 31.819805145263672
gradient of the complex number are tensor([-3.5355])
calculated loss value is 28.284271240234375
gradient of the complex number are tensor([-3.5355])
calculated loss value is 24.748737335205078
gradient of the complex number are tensor([-3.5355])
calculated loss value is 21.21320343017578
gradient of the complex number are tensor([-3.5355])
calculated loss value is 17.677669525146484
gradient of the complex number are tensor([-3.5355])
calculated loss value is 14.142135620117188
gradient of the complex number are tensor([-3.5355])
calculated loss value is 10.60660171508789
gradient of the complex number are tensor([-3.5355])
calculated loss value is 7.071067810058594
gradient of the complex number are tensor([-3.5355])
calculated loss value is 3.535533905029297
gradient of the complex number are tensor([-3.5355])
calculated loss value is 0.0
gradient of the complex number are tensor([nan])
``````

Best.

K. Frank

Thanks Frank ,

But it seems the code still have some issue, the gradient is constant over the epochs and becomes Nan afterwards.
The output you posted also have same gradient for all the epochs
`gradient of the number are tensor([-3.5355])`

Hi @KFrank @Lezcano,
Do you think its worth trying to write my own gradient for this matrix assembly using ` from torch.autograd import Function`

Hi Rishi!

Calculate analytically the derivative of `loss` with respect to `param`,
specifically for the values of `a`, `b`, and `c` that you are using. (Hint:
You will discover that it is constant.)

In essence, you are having pytorch calculate `sqrt (x**2)`.
`sqrt (x)` is singular at `x = 0`, and its derivative diverges (and
pytorch â€“ rightly â€“ doesnâ€™t replace `sqrt (x**2)` with `x`).

Consider:

``````>>> x = torch.tensor ([0.0], requires_grad = True)
>>> torch.sqrt (x).backward()
tensor([inf])
>>> x = torch.tensor ([0.0], requires_grad = True)
>>> torch.sqrt (x**2).backward()