This post was flagged by the community and is temporarily hidden.
The criterion is independent from the model and they “communicate” through the training process. I.e. the criterion calculates the loss, which is then used for the gradient calculation. The optimizer will then update the passed parameters such that the model reduces the loss.
This post was flagged by the community and is temporarily hidden.
Why would the model or optimizer need a handle to the criterion?
The loss.backward()
call will calculate the gradients and will assign (or accumulate) these to the .grad
attributes of all parameters. The loss is connected to the model via the computation graph and thus the backward
pass has access to all used parameters. There is no need to pass the criterion handle around as it doesn’t contain anything and is just providing the loss calculation.
This post was flagged by the community and is temporarily hidden.
Yes, this is done via Autograd, which creates a computation graph and assigns valid backward functions to the .grad_fn
attribute of activation tensors.
Thanks. Here is original code layed out…
https://blog.paperspace.com/writing-cnns-from-scratch-in-pytorch/
What line there does this “Autograd” piece and creates bindings
between “criterion” and/or loss “model” ?
cs
PyTorch will track all differentiable operations on tensors requiring gradients:
x = torch.randn(1, 1, requires_grad=True)
y = x * 2
print(y.grad_fn)
# <MulBackward0 object at 0x7f161e967490>
# loss can be anything
loss = y**2
loss.backward()
print(x.grad)
# tensor([[-8.4696]])
This doc and this tutorial might be good starters.
Sorry to press this issue. If I can humbly ask one more question…
Yes that makes perfect sense because your line “y = x * 2” and then “loss = y**2”
associates x with loss.
In my aforementioned link to the code and tutorial
there is nothing equivalent to that! My aforementioned link uses these 2 lines…
criterion = nn.CrossEntropyLoss()
..
loss = criterion(outputs, labels)
Where does the analog to your x get introduced!?
The model’s outputs
tensor is attached to a computation graph as trainable parameters were used. Instead of the x
tensor any .weight
and/or .bias
of layers was used, which also creates a computation graph:
x = torch.randn(1, 10)
lin = nn.Linear(10, 10)
out = lin(x)
loss = out.mean()
loss.backward()
Yes that’s another one that is readily understood. The loss object is tied to x through the out = lin(x) line.
My example…
criterion = nn.CrossEntropyLoss()
…
loss = criterion(outputs, labels)
had the loss set to some generic instance of nn.CrossEntropyLoss without
any relation to the model or my equivalent of x to use your nomenclature.
Thanks!
Chris
x
won’t receive any gradients in my latest example and you can verify it by accessing its .grad
attribute, which will return None
. It’s just the input to the model and the lin.weight
and lin.bias
attributes are now the leaf tensors receiving gradients.
The analogy is that a computatiom graph will be created by applying differentiable operations using trainable parameterrs. The model’s output corresponds to the y
tensor and the lin.weight
and lin.bias
correspond to x
from my first example.
The loss function is just applying other differentiable operations (in the same way a linear layer is performing a matmul).
Bottom line is all of your examples make sense.
The only example I can’t understand is the one
with
criterion = nn.CrossEntropyLoss()
…
loss = criterion(outputs, labels)
It this case the loss has no connection to any tensor anywhere?!
cs
It does since output
was created by the model as mentioned a few times already and as seen in the linked blog post:
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
Thanks. I think I get it now. This seems like an oddity in the way PyTorch does things…
So you’re saying in “outputs = model(images)”, that the outputs object will have more
than just the numerical predictions!? outputs will also have a link/handle (whatever you want to call it)
to the model object? I can accept that but that seems weird. Did I get it right now?
cs
Yes, your explanation is correct. The “link” from the outputs
tensor to the model’s parameters is done by Autograd and reflected in the computation graph and the outputs.grad_fn
object. A very naive point of view would be to think about Autograd “recording” the operations in the forward pass by creating the computation graph. In the backward pass the grad_fn
will be used to backpropagate through the entire graph. Internals might be more complicated (e.g. PyTorch is smart enough to figure out when to stop the backpropagation if no gradients are needed in previous operations).