Overarching problem: The limiting case of kronecker product with unit tensor does not allow for second backward pass.
Consider the code snippets:
import torch
import torch.nn as nn
import numpy as np
#first part
np.random.seed(42)
a=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
a.requires_grad = True
b=torch.ones(1,1)
c=torch.kron(a,b)
d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
loss=torch.sum(d*c)
print("a",a)
print("b",b)
print("c",c)
print("c",d)
print("loss",d)
print(loss)
loss.backward(retain_graph=True) #If retain_graph=False it throws an error
#########################################################
d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
loss=torch.sum(d*c)
print(d)
print(loss)
loss.backward()
grad1=a.grad.clone()
#######################################################
#######################################################
print("---------------------------------------------------------")
#2nd part
np.random.seed(42)
a=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
a.requires_grad = True
b=torch.ones(1,1)
c=a
d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
loss=torch.sum(d*c)
print("a",a)
print("b",b)
print("c",c)
print("c",d)
print("loss",d)
print(loss)
loss.backward()
#########################################################
d=torch.from_numpy(np.random.random(2*2).reshape((2,2)))
print(d)
loss=torch.sum(d*c)
print(loss)
loss.backward()
grad2=a.grad.clone()
print("-------------------------------------------------")
print(torch.isclose(grad1, grad2))
This prints:
a tensor([[0.3745, 0.9507],
[0.7320, 0.5987]], dtype=torch.float64, requires_grad=True)
b tensor([[1.]])
c tensor([[0.3745, 0.9507],
[0.7320, 0.5987]], dtype=torch.float64, grad_fn=<UnsafeViewBackward0>)
c tensor([[0.1560, 0.1560],
[0.0581, 0.8662]], dtype=torch.float64)
loss tensor([[0.1560, 0.1560],
[0.0581, 0.8662]], dtype=torch.float64)
tensor(0.7678, dtype=torch.float64, grad_fn=<SumBackward0>)
tensor([[0.6011, 0.7081],
[0.0206, 0.9699]], dtype=torch.float64)
tensor(1.4940, dtype=torch.float64, grad_fn=<SumBackward0>)
---------------------------------------------------------
a tensor([[0.3745, 0.9507],
[0.7320, 0.5987]], dtype=torch.float64, requires_grad=True)
b tensor([[1.]])
c tensor([[0.3745, 0.9507],
[0.7320, 0.5987]], dtype=torch.float64, requires_grad=True)
c tensor([[0.1560, 0.1560],
[0.0581, 0.8662]], dtype=torch.float64)
loss tensor([[0.1560, 0.1560],
[0.0581, 0.8662]], dtype=torch.float64)
tensor(0.7678, dtype=torch.float64, grad_fn=<SumBackward0>)
tensor([[0.6011, 0.7081],
[0.0206, 0.9699]], dtype=torch.float64)
tensor(1.4940, dtype=torch.float64, grad_fn=<SumBackward0>)
-------------------------------------------------
tensor([[True, True],
[True, True]])
Here’s what I intend to do: I have a matrix a
, I do a kronecker product with the unit tensor which will yield the same matrix a
. Then I multiply it be a second matrix d
and calculate the sum of the resultant matrix. Then I calculate the loss and bakprop. It works. I get the same matrix a
and now multiply it by a newly generated d
. Try to calculate loss and backprop and I get an error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
But if I now remove the unit kronecker product the same code (which is in the second part of the snippet above), I can now backpropagate two times without a problem. For the same mathematical operation, I have to place retain_graph=True
when using the kronecker product, all due to the UnsafeViewBackward0 grad_fn. Despite them having the same gradient in this example, I’ve found this limiting when the kronecker is not on the unit tensor but bigger tensors and even in this limiting case, two networks don’t evolve in the same way. Why does this happen? How can I prevent this?