Let’s say I have a model, which has two fully connected layers. in the forward method, I am arbitrarily changing the output tensor after first layer by squaring the tensor.

I want to understand, how will the autograd work in this case? Will this transformation affect the learned weight vectors? ideally it should. What if I don’t want to take the effect of this transformation on the gradient?

I am trying to reason along following lines :

The out tensor has requires_grad = False. So this is not going to get updated. But autograd will compute gradient of out*out w.r.t. out to propagate the gradients. right ?

The transformation out = out*out, is not having any parameters.

What if I want to apply a transformation but don’t want it to have effect on my previous layers parameters. How to achieve that ?

class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(10,20)
self.fc2 = nn.Linear(20,10)
def forward(self, inp):
out = self.fc1(inp)
# transform the out vector
out = out * out
out = self.fc2(out)
return out

I am unable to wrap my head around how the back propagation will work after doing this. I mean the gradient w.r.t. parameters in FC layer1 will depend on out value. But out value has changed.

Basically, W (a parameter in FC layer 1) has gradient = dLoss/dOut * dOut/dW. Now, which out value will be used for this? out or out = out*out ?

If I understand correctly if you put the with torch.no_grad() line the out value is the one that will be used to compute the gradients. Since the operation out * out will not be added to the computational graph. Again I am not 100% sure about this. Maybe @ptrblck can offer his alwaysvaluable insight. Best of luck.

As I understand, you would like to change the intermediate output in some way (I call it changed_out) and you do not want the gradient to be accounted for that operation. Also, changed_out is going to the input for the other layers down the line.

I think, it will lead to disconnected networks (network1 --> intermediate operation --> network2). Unless network1 gets gradient with respect to its output, it is not going to be trained. I am not sure about the usecase why such operation is needed. Maybe if you could explain a bit more on the usecase, you would get valuable answers.

I want to measure the accuracy of the network after transforming previous layer outputs. So I am trying to create a new layer, but for that I need to create a backward function.

If you do not want the previous layers to be learned again with the transformation and if you are willing to freeze the previous layer weights, you do not need to think about backward function.
If not, I do not see a way for getting around backward function, as of now.

I actually do not want to do this while training. I just want to try some transformation during inference. I have a function Transformation which takes numpy.ndarray, and it will return the same. I just want to update the data of previous_layer_out tensor using this Transformation function. Should I just do this :

out = Transformation(previous_layer_out.numpy())
previous_layer_out = torch.from_numpy(out)