Effect of tensor modification in forward pass on gradient calculation

Consider the forward pass

def forward(self, x):
  x = self.pool(F.relu(self.conv1(x)))
  x = self.pool(F.relu(self.conv2(x)))
  x = x.view(-1, 16 * 5 * 5)
  x = F.relu(self.fc1(x))
  x = F.relu(self.fc2(x))
  x = self.fc3(x)
  return x

In model forward passes, if we use tensor transformations such as view(as shown in the example), transpose, permute,reshape - does this effect grad values?

If no, how the mapping between the original and transformed tensor are maintained?

1 Like

Yes, all the mentioned operations will be tracked by Autocrat such that the gradient will be passed to the corresponding values during the backward pass.

1 Like

@ptrblck As a matter of interest, are there any operations that AREN’T tracked by Autocrat that we should keep an eye out for? I work under the assumption that ANY function/processing bundled in pytorch is tracked - is that a reasonable assumption?

Yes, I think this is generally a valid assumption and should be true as long as you don’t use e.g. the .data attribute of a tensor. If the operation is not differentiable or not implemented, Autograd would raise an error.

Also, interesting autocorrect when trying to type “Autograd”, but I think in the end my smartphone might know who has the absolute power in PyTorch land. :smiley:

1 Like