Operation on diagonals of matrix batch

calincru · July 16, 2019, 6:49pm

I have two tensors with shapes [n, d, d] and [n, 1], respectively, and I would like to add the latter to the diagonals of the matrices in the former. What’s the most straightforward way of doing it? It shouldn’t be in-place.

LE: I’m curious if there’s a better way than stacking torch.eyes.

tom · July 16, 2019, 7:29pm

I think inplace is the best way, but I’ll throw in a .clone(), so you get to keep the input:

a = torch.randn(5,4,4, requires_grad=True)
b = torch.randn(5,1, requires_grad=True) 
c = a.clone()
c.diagonal(dim1=-2, dim2=-1)[:]  +=  b

# backward works as expected:
c.sum().backward()
print(a.grad, b.grad) # ones_like(a) and full_like(b, 4)

Best regards

Thomas

calincru · July 16, 2019, 7:34pm

Thanks, @tom, looks good. BTW, the reason I wanted it to not be inplace was because I need it to be differentiable. Does backward work even if it’s inplace?

tom · July 16, 2019, 7:58pm

The rule of thumb is that inplace works unless it does not.

So the two things that usually break are

you move a leaf tensor into the graph (if you remove the cloning in above example - and cloning helps),
when a isn’t a leaf and whatever computed a wants to have a to compute the backward (cloning helps here, too).

So the conventional wisdom is to not use inplace ops, but looking deeper, it can usually be made to work. I always joke to write a non-deep-learning PyTorch book with @ptrblck where we would have a section on inplace ops.

Best regard

Thomas