Hows is the optimizer and the loss works together

I read two posts here about this subject but I still could not figure out the “software design” that makes the loss function to have any impact on the model or the optimizer.

model = ...
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
predictedY = model(xTrain)
loss = criterion(predictedY, yTrain)
loss.backward() # how does the output of "backward()" impacting the model, if the loss does not have any reference / "pointer" to the model?
optimizer.step()
optimizer.zero_grad()  
  • the optimizer is getting model.parameters() in the c’tor, so that makes him “Aware” of the model. I get that.
  • criterion is not related to the model or to the optimizer
  • loss is not related to the model or to the optimizer

So how does the output of the loss getting propagated into the model?

1 Like

PyTorch’s Autograd tracks all operations performed with trainable parameters and uses this computation graph to calculate the gradients in the backward call. The loss function is treated in the same way as any differentiable operation.
The optimizer holds references to the passed parameters and uses their .grad attribute to update these.

“Autograd tracks all operations performed with trainable”
How does it tracks? I am trying to technically undertand it the implicit mechanism.
I mean, I would expect the code to look something like this:

model = ...
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
predictedY = model(xTrain)
loss = criterion(predictedY, yTrain)
loss.backward(model) # **now it would be obvious** 
optimizer.step()
optimizer.zero_grad()  

These docs might be helpful.

1 Like