I’m fairly new to PyTorch, and I’m running into some questions about how autograd will interact with some parts of my code which I haven’t been able to find answers to.
def forward(self,theta: Tensor) -> Tensor:
#theta is a (B,F,L) tensor, where B is a batch dimension
th0 = theta[:,:,0].view(self.b,5,1,1,1)
th1 = theta[:,:,1].view(self.b,5,1,1,1)
th2 = theta[:,:,2].view(self.b,5,1,1,1)
th3 = theta[:,:,3].view(self.b,5,1,1,1)
#frames is a 5-tensor of shape (B,F,L,4,4)
#in the start state, all of the (4,4) blocks are identity
frames = self.start_state.clone()
#swivel is a 5-tensor of shape (B,F,L,4,4)
S = self.swivel.clone()
S[...,:3,:3] = self.I + torch.sin(th0)*S[...,:3,:3] + \
(1-torch.cos(th0))*(S[...,:3,:3]@S[...,:3,:3])
S.requires_grad = True
S.retain_grad()
frames = S@frames
I had a few questions about this. After some similar further transformations, the frames tensor (or a particular slice of it) will be the output.
-
S here is a temporary derived from a member variable of my class- from some reading I got the impression that
S.retain_grad()
would allow the gradients of S to be maintained for the backward pass, but is this correct? If I returnedframes
at this step and called a backward pass, it ‘works’, but I’m not sure if it’s actually computing gradients throughS
as I would like it to. -
If I put the two lines enabling grad for S before the operation on the 3x3 slice of its last two dimensions, PyTorch gives an error saying an in-place operation was called on a leaf of the compute graph. This makes sense, and the current placement raises no errors, but I’m left wondering if putting those lines after that op is actually fixing the issue, or if it will still interact weirdly with the autograd engine?
-
I’m not very sure about how editing/multiplying slices of tensors affects gradient computation, if at all. For example, the next transformations to be applied to
frames
are:
# curl is a (B,F,L-1,4,4) tensor
C1 = self.curl.clone()
....
C2 = self.curl[:,:,:2,...].clone()
....
frames[:,:,1:,:,:] = C1@frames[:,:,1:,:,:]
frames[:,:,2:,:,:] = C2@frames[:,:,2:,:,:]
#I've cut out a few lines computing blocks of C1, C2 and enabling grad here,
#they're almost identical to what was done for S
My intent is to only apply the transformations of C1, C2 on sections of frames
in a differentiable manner- but will this work, or is there a nuance I’m missing?
Any help would be greatly appreciated! I’ve not worked with PyTorch internals much at all (although I’m slowly growing through the tutorials and docs), so any detail might help.
(P.S. - if you have stylistic comments, I’d be glad to hear them as well. I have tried to do things cleanly where possible, but I’m not sure that I’ve succeeded whatsoever.)