I have two tensors with shapes [n, d, d] and [n, 1], respectively, and I would like to add the latter to the diagonals of the matrices in the former. What’s the most straightforward way of doing it? It shouldn’t be in-place.
LE: I’m curious if there’s a better way than stacking torch.eyes.
I think inplace is the best way, but I’ll throw in a .clone(), so you get to keep the input:
a = torch.randn(5,4,4, requires_grad=True)
b = torch.randn(5,1, requires_grad=True)
c = a.clone()
c.diagonal(dim1=-2, dim2=-1)[:] += b
# backward works as expected:
c.sum().backward()
print(a.grad, b.grad) # ones_like(a) and full_like(b, 4)
Thanks, @tom, looks good. BTW, the reason I wanted it to not be inplace was because I need it to be differentiable. Does backward work even if it’s inplace?
The rule of thumb is that inplace works unless it does not.
So the two things that usually break are
you move a leaf tensor into the graph (if you remove the cloning in above example - and cloning helps),
when a isn’t a leaf and whatever computed a wants to have a to compute the backward (cloning helps here, too).
So the conventional wisdom is to not use inplace ops, but looking deeper, it can usually be made to work. I always joke to write a non-deep-learning PyTorch book with @ptrblck where we would have a section on inplace ops.