Hi @jcallahan4,
Good to see your error is already solved.
Since you wanted to understand what autograd is doing, and how to get the network to use the input tensors in the computation graph, I’m adding some details in that regard:
torch.autograd
is pytorch’s automatic differentiation engine that, as the name suggests, deals with automatically calculating gradients for any “computational graph”.
Computational graphs are what that get build by autograd as and when tensors are subjected to mathematical operations. While building these graphs, autograd
also saves tensors that’ll be required to calculate the gradients wrt tensors having their requires_grad
attribute set to True
.
(So, when you use torch.autograd.grad
or use the .backward
call, these saved tensors are used).
Now, torch.no_grad()
basically tells autograd to look away. It can be used as a context manager so that for any piece of code occurring within this context, autograd shall build no graph (or will not further populate any graph that’s already there).
i.e. It’ll not track any operations.
Now, for your code, you are differentiating y (output) wrt x,
where y = net(x)
which essentially means y = net.forward(x)
.
Inside forward, output_layer(z) which is returned (and hence is essentially what gets stored in y = net(x)
) is a result of operations on normalized_x
, but normalized_x
is getting created as a result of operations on x under torch.no_grad()
.
This means even if
y = self.input_act(self.input_layer(normalized_x))
and
z = self.hidden_layers(y)
are a part of the computation graph, normalized_x
isn’t really.
And so when you tried to differentiate y (which is returned by forward) wrt x, it produced the error
One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
So, here the error prompt is most probably talking about normalized_x
as the tensor that appears to not have been used in the graph.
Note: Even if normalized_x
is getting created as a result of operations on x
whose requires_grad
is set to True, it doesn’t matter. Under torch.no_grad()
, nothing is tracked by autograd and so all resulting tensors have their requires_grad=False
.
Hope this helps,
Srishti