UnsafeViewBackward breaks computational graph

duartejfs · November 29, 2022, 6:35pm

Overarching problem: The limiting case of kronecker product with unit tensor does not allow for second backward pass.

Consider the code snippets:

import torch
import torch.nn as nn
import numpy as np

#first part
np.random.seed(42)
a=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
a.requires_grad = True

b=torch.ones(1,1)

c=torch.kron(a,b)

d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))

loss=torch.sum(d*c)

print("a",a)
print("b",b)
print("c",c)
print("c",d)
print("loss",d)
print(loss)

loss.backward(retain_graph=True) #If retain_graph=False it throws an error

#########################################################

d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))

loss=torch.sum(d*c)
print(d)
print(loss)

loss.backward()

grad1=a.grad.clone()

#######################################################
#######################################################
print("---------------------------------------------------------")
#2nd part

np.random.seed(42)
a=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
a.requires_grad = True

b=torch.ones(1,1)

c=a

d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))

loss=torch.sum(d*c)


print("a",a)
print("b",b)
print("c",c)
print("c",d)
print("loss",d)
print(loss)

loss.backward()

#########################################################

d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
print(d)

loss=torch.sum(d*c)
print(loss)
loss.backward()

grad2=a.grad.clone()

print("-------------------------------------------------")
print(torch.isclose(grad1, grad2))

This prints:

a tensor([[0.3745, 0.9507],
        [0.7320, 0.5987]], dtype=torch.float64, requires_grad=True)
b tensor([[1.]])
c tensor([[0.3745, 0.9507],
        [0.7320, 0.5987]], dtype=torch.float64, grad_fn=<UnsafeViewBackward0>)
c tensor([[0.1560, 0.1560],
        [0.0581, 0.8662]], dtype=torch.float64)
loss tensor([[0.1560, 0.1560],
        [0.0581, 0.8662]], dtype=torch.float64)
tensor(0.7678, dtype=torch.float64, grad_fn=<SumBackward0>)
tensor([[0.6011, 0.7081],
        [0.0206, 0.9699]], dtype=torch.float64)
tensor(1.4940, dtype=torch.float64, grad_fn=<SumBackward0>)
---------------------------------------------------------
a tensor([[0.3745, 0.9507],
        [0.7320, 0.5987]], dtype=torch.float64, requires_grad=True)
b tensor([[1.]])
c tensor([[0.3745, 0.9507],
        [0.7320, 0.5987]], dtype=torch.float64, requires_grad=True)
c tensor([[0.1560, 0.1560],
        [0.0581, 0.8662]], dtype=torch.float64)
loss tensor([[0.1560, 0.1560],
        [0.0581, 0.8662]], dtype=torch.float64)
tensor(0.7678, dtype=torch.float64, grad_fn=<SumBackward0>)
tensor([[0.6011, 0.7081],
        [0.0206, 0.9699]], dtype=torch.float64)
tensor(1.4940, dtype=torch.float64, grad_fn=<SumBackward0>)
-------------------------------------------------
tensor([[True, True],
        [True, True]])

Here’s what I intend to do: I have a matrix a, I do a kronecker product with the unit tensor which will yield the same matrix a. Then I multiply it be a second matrix d and calculate the sum of the resultant matrix. Then I calculate the loss and bakprop. It works. I get the same matrix a and now multiply it by a newly generated d. Try to calculate loss and backprop and I get an error:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

But if I now remove the unit kronecker product the same code (which is in the second part of the snippet above), I can now backpropagate two times without a problem. For the same mathematical operation, I have to place retain_graph=True when using the kronecker product, all due to the UnsafeViewBackward0 grad_fn. Despite them having the same gradient in this example, I’ve found this limiting when the kronecker is not on the unit tensor but bigger tensors and even in this limiting case, two networks don’t evolve in the same way. Why does this happen? How can I prevent this?

soulitzer · November 30, 2022, 8:40pm

kron probably saves tensors that are needed to compute the backward pass. If you backpropogate without specifying retain_graph that saved tensor is cleared, preventing you from backpropogating a second time.

Why is this a problem? Most operations in pytorch will require tensors to be saved for the backward pass, so not sure if you can avoid this.

duartejfs · December 1, 2022, 10:54am

Yes, that is the point I was missing. kron saves the tensors, and then when trying to compute the backward pass the graph had been deleted going from a>kron(a,b)->c. When I then try to compute the second backward pass on the second d variable I do not redo kron(a,b) which does not update the graph and it throws an error. That is what I was missing: I was not updating c to complete the computation. In this case using retain_graph solves it but consider this case:

I have a variable a which is a parameter of a neural net. This parameter is initialized within a layer layer1 in the __init__ function. In it I also define b such that b[c1:c2, c3:c4]=a. (c1,c2,c3,c4 are some integers) When I do the backwards update with the optimizer, a gets updated, but b does not! How can I force b to be updated by the optimizer? A workaround this is to redefine b in the layer1.forward() method, but I wanted to avoid this