This post was flagged by the community and is temporarily hidden.

The criterion is independent from the model and they “communicate” through the training process. I.e. the criterion calculates the loss, which is then used for the gradient calculation. The optimizer will then update the passed parameters such that the model reduces the loss.

This post was flagged by the community and is temporarily hidden.

Why would the model or optimizer need a handle to the criterion?

The `loss.backward()`

call will calculate the gradients and will assign (or accumulate) these to the `.grad`

attributes of all parameters. The loss is connected to the model via the computation graph and thus the `backward`

pass has access to all used parameters. There is no need to pass the criterion handle around as it doesn’t contain anything and is just providing the loss calculation.

This post was flagged by the community and is temporarily hidden.

Yes, this is done via Autograd, which creates a computation graph and assigns valid backward functions to the `.grad_fn`

attribute of activation tensors.

Thanks. Here is original code layed out…

https://blog.paperspace.com/writing-cnns-from-scratch-in-pytorch/

What line there does this “Autograd” piece and creates bindings

between “criterion” and/or loss “model” ?

cs

PyTorch will track all differentiable operations on tensors requiring gradients:

```
x = torch.randn(1, 1, requires_grad=True)
y = x * 2
print(y.grad_fn)
# <MulBackward0 object at 0x7f161e967490>
# loss can be anything
loss = y**2
loss.backward()
print(x.grad)
# tensor([[-8.4696]])
```

This doc and this tutorial might be good starters.

Sorry to press this issue. If I can humbly ask one more question…

Yes that makes perfect sense because your line “*y = x * 2*” and then “*loss = y**2*”

associates *x* with *loss*.

In my aforementioned link to the code and tutorial

there is nothing equivalent to that! My aforementioned link uses these 2 lines…

```
criterion = nn.CrossEntropyLoss()
..
loss = criterion(outputs, labels)
```

Where does the analog to your *x* get introduced!?

The model’s `outputs`

tensor is attached to a computation graph as trainable parameters were used. Instead of the `x`

tensor any `.weight`

and/or `.bias`

of layers was used, which also creates a computation graph:

```
x = torch.randn(1, 10)
lin = nn.Linear(10, 10)
out = lin(x)
loss = out.mean()
loss.backward()
```

Yes that’s another one that is readily understood. The *loss* object is tied to *x* through the *out = lin(x)* line.

My example…

criterion = nn.CrossEntropyLoss()

…

loss = criterion(outputs, labels)

had the loss set to some generic instance of nn.CrossEntropyLoss **without**

any relation to the model or my equivalent of *x* to use your nomenclature.

Thanks!

Chris

`x`

won’t receive any gradients in my latest example and you can verify it by accessing its `.grad`

attribute, which will return `None`

. It’s just the input to the model and the `lin.weight`

and `lin.bias`

attributes are now the leaf tensors receiving gradients.

The analogy is that a computatiom graph will be created by applying differentiable operations using trainable parameterrs. The model’s output corresponds to the `y`

tensor and the `lin.weight`

and `lin.bias`

correspond to `x`

from my first example.

The loss function is just applying other differentiable operations (in the same way a linear layer is performing a matmul).

Bottom line is all of your examples make sense.

The only example I can’t understand is the one

with

criterion = nn.CrossEntropyLoss()

…

loss = criterion(outputs, labels)

It this case the loss has no connection to any tensor anywhere?!

cs

It does since `output`

was created by the model as mentioned a few times already and as seen in the linked blog post:

```
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
```

Thanks. I think I get it now. This seems like an oddity in the way PyTorch does things…

So you’re saying in “*outputs = model(images)*”, that the *outputs* object will have more

than just the numerical predictions!? *outputs* will **also** have a link/handle (whatever you want to call it)

to the *model* object? I can accept that but that seems weird. Did I get it right now?

cs

Yes, your explanation is correct. The “link” from the `outputs`

tensor to the model’s parameters is done by Autograd and reflected in the computation graph and the `outputs.grad_fn`

object. A very naive point of view would be to think about Autograd “recording” the operations in the forward pass by creating the computation graph. In the backward pass the `grad_fn`

will be used to backpropagate through the entire graph. Internals might be more complicated (e.g. PyTorch is smart enough to figure out when to stop the backpropagation if no gradients are needed in previous operations).