I’m not sure if it’s a different issue, but using a matrix constructed like this to optimize the base variables is causing a second backwards pass to fail. I’ve removed the unrelated code. The optimization works without the rotation matrix but gives
RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.
when run with the rotation matrix. Any insight as to why this might be? Here’s the cruft of the optimization that’s causing the issue.
def rotation_tensor(theta, phi, psi, n_comps):
rot_x = Variable(torch.zeros(n_comps, 3, 3).cuda(), requires_grad=False)
rot_y = Variable(torch.zeros(n_comps, 3, 3).cuda(), requires_grad=False)
rot_z = Variable(torch.zeros(n_comps, 3, 3).cuda(), requires_grad=False)
rot_x[:, 0, 0] = 1
rot_x[:, 0, 1] = 0
rot_x[:, 0, 2] = 0
rot_x[:, 1, 0] = 0
rot_x[:, 1, 1] = theta.cos()
rot_x[:, 1, 2] = theta.sin()
rot_x[:, 2, 0] = 0
rot_x[:, 2, 1] = -theta.sin()
rot_x[:, 2, 2] = theta.cos()
rot_y[:, 0, 0] = phi.cos()
rot_y[:, 0, 1] = 0
rot_y[:, 0, 2] = -phi.sin()
rot_y[:, 1, 0] = 0
rot_y[:, 1, 1] = 1
rot_y[:, 1, 2] = 0
rot_y[:, 2, 0] = phi.sin()
rot_y[:, 2, 1] = 0
rot_y[:, 2, 2] = phi.cos()
rot_z[:, 0, 0] = psi.cos()
rot_z[:, 0, 1] = -psi.sin()
rot_z[:, 0, 2] = 0
rot_z[:, 1, 0] = psi.sin()
rot_z[:, 1, 1] = psi.cos()
rot_z[:, 1, 2] = 0
rot_z[:, 2, 0] = 0
rot_z[:, 2, 1] = 0
rot_z[:, 2, 2] = 1
return torch.bmm(rot_z, torch.bmm(rot_y, rot_x))
optimizer = optim.LBFGS([diff_theta, diff_phi, diff_psi], lr=lr)
diff_theta = Variable(torch.zeros(n_diff_comps, 1, 1).cuda(), requires_grad=True)
diff_phi = Variable(torch.zeros(n_diff_comps, 1, 1).cuda(), requires_grad=True)
diff_psi = Variable(torch.zeros(n_diff_comps, 1, 1).cuda(), requires_grad=True)
diff_rot = rotation_tensor(diff_theta, diff_phi, diff_psi, n_diff_comps).cuda()
for i in range(3000):
def closure():
optimizer.zero_grad()
new_diff = Variable(torch.zeros(*alb_diff_batch_init.size()).cuda())
for comp in range(n_diff_comps):
new_diff += (diff_rot[comp] @ diff_base_vars[comp].view(3, -1)).view(*diff_base_vars[comp].size())
loss = new_diff.sum()
loss.backward()
return loss
optimizer.step(closure)
Edit: It looks like you need to re-create the intermediate matrix again every iteration. Moving
diff_rot = rotation_tensor(diff_theta, diff_phi, diff_psi, n_diff_comps).cuda()
to the inner loop fixed the issue.