el3ment
(Robert Pottorff)
March 24, 2018, 5:46am
1
I’m trying to train a model that uses a symmetric matrix – I implemented it via:
T = nn.Parameter(torch.Tensor(STATE_DIM, STATE_DIM))
self.Q = torch.mm(T, T.t())
but when trying to train, I am getting an error about needing retain_graph=True but then my loss value doesn’t decrease during optimizer steps
Is there a way to enforce symmetry in a way that doesn’t require retain_graph? if not, how do I get my loss to go down when retain_graph=True?
1 Like
jpeg729
(jpeg729)
March 24, 2018, 7:33am
2
The error about needing retain_graph=True is likely unrelated to your symmetric matrix.
I tried the following without getting any error.
T = nn.Parameter(torch.Tensor(5,5))
T.data.normal_()
Q = torch.mm(T, T.t())
out = torch.nn.functional.linear(torch.rand(2,5), Q)
out.mean().backward()
print(T.grad)
If your model is a recurrent model, then you should probably detach the hidden state between batches. See Adding new hidden layer to LSTM
If your model is not a recurrent model, then something else weird is happening, can you provide a small example that produces the error?
el3ment
(Robert Pottorff)
March 24, 2018, 1:49pm
3
It looks like it’s the for loop causing the problem – the first time through things are fine, but the second time causes an issue.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.T = nn.Parameter(torch.Tensor(5, 5))
self.Q = torch.mm(self.T, self.T.t())
def forward(self, x0):
rhat = x0.unsqueeze(1).matmul(self.Q).matmul(x0.unsqueeze(2)).squeeze(2)
return rhat
loop = tqdm.tqdm(range(EPOCHS))
model = Net()
optimizer = optim.Adam(model.parameters(), lr=1e-2)
for i in range(2):
r = Variable(torch.from_numpy(np.random.randn(32, 1).astype(np.float32)))
x0 = Variable(torch.from_numpy(np.random.randn(32, 5).astype(np.float32)))
optimizer.zero_grad()
rhat = model(x0)
loss = F.mse_loss(rhat, r)
loss.backward()
optimizer.step()
jpeg729
(jpeg729)
March 24, 2018, 8:58pm
4
You should recalculate Q for each forward pass. Otherwise the computation graph for Q gets freed when you run loss.backward()
and the second run can’t backpropagate all the way back to T.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.T = nn.Parameter(torch.Tensor(5, 5))
def forward(self, x0):
Q = torch.mm(self.T, self.T.t())
rhat = x0.unsqueeze(1).matmul(Q).matmul(x0.unsqueeze(2)).squeeze(2)
return rhat
el3ment
(Robert Pottorff)
March 25, 2018, 12:11am
5
Doh! Absolutely. Thank you!