Let’s say I have a model, which has two fully connected layers. in the forward method, I am arbitrarily changing the `output` tensor after first layer by squaring the tensor.

I want to understand, how will the autograd work in this case? Will this transformation affect the learned weight vectors? ideally it should. What if I don’t want to take the effect of this transformation on the gradient?

I am trying to reason along following lines :

• The `out` tensor has `requires_grad = False`. So this is not going to get updated. But autograd will compute gradient of `out*out` w.r.t. `out` to propagate the gradients. right ?

• The transformation `out = out*out`, is not having any parameters.

• What if I want to apply a transformation but don’t want it to have effect on my previous layers parameters. How to achieve that ?

``````class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()

self.fc1 = nn.Linear(10,20)
self.fc2 = nn.Linear(20,10)

def forward(self, inp):

out = self.fc1(inp)

# transform the out vector
out = out * out

out = self.fc2(out)

return out

``````

Im not sure this will work, but could you please try:

``````with torch.no_grad():
out = out * out
out = self.fc2(out)

``````

Let me know if this solves your problem. In principle the operations realized under `torch.no_grad()` do not compute gradients

It sounds like what you want to do is to `.detach()` one of the arguments to the squaring operation, so that gradient only propagates to `out` once. Try:

``````out = out * out.detach()
``````

I am unable to wrap my head around how the back propagation will work after doing this. I mean the gradient w.r.t. parameters in FC layer1 will depend on `out` value. But `out` value has changed.

Basically, W (a parameter in FC layer 1) has gradient = dLoss/dOut * dOut/dW. Now, which `out` value will be used for this? `out` or `out = out*out` ?

If I understand correctly if you put the `with torch.no_grad()` line the `out` value is the one that will be used to compute the gradients. Since the operation `out * out` will not be added to the computational graph. Again I am not 100% sure about this. Maybe @ptrblck can offer his alwaysvaluable insight. Best of luck.

As I understand, you would like to change the intermediate output in some way (I call it `changed_out`) and you do not want the gradient to be accounted for that operation. Also, `changed_out` is going to the input for the other layers down the line.

I think, it will lead to disconnected networks (`network1 --> intermediate operation --> network2`). Unless `network1` gets gradient with respect to its output, it is not going to be trained. I am not sure about the usecase why such operation is needed. Maybe if you could explain a bit more on the usecase, you would get valuable answers.

I want to measure the accuracy of the network after transforming previous layer outputs. So I am trying to create a new layer, but for that I need to create a backward function.

If you do not want the `previous layers` to be learned again with the transformation and if you are willing to freeze the previous layer weights, you do not need to think about backward function.
If not, I do not see a way for getting around backward function, as of now.

I actually do not want to do this while training. I just want to try some transformation during inference. I have a function `Transformation` which takes `numpy.ndarray`, and it will return the same. I just want to update the data of `previous_layer_out` tensor using this `Transformation` function. Should I just do this :

``````out = Transformation(previous_layer_out.numpy())
previous_layer_out = torch.from_numpy(out)
``````

ok. If you only need it for inference, yes, that should be enough. What error are you facing?