How the handles are passed around when autograd

This is a beginner question. I come from C++ and a python beginner. I’m confused by how the handles are passed around.
Here is the sample code
import torch
import torch.nn.functional as F

x_data = torch.Tensor(torch.Tensor([[1.0], [2.0], [3.0], [4.0]]))
y_data = torch.Tensor(torch.Tensor([[0.], [0.], [1.], [1.0]]))

class Model(torch.nn.Module):
def init(self):
super(Model, self).init()
self.linear = torch.nn.Linear(1, 1)

def forward(self, x):
    y_pred = F.sigmoid(self.linear(x))
    return y_pred

model = Model()

criterion = torch.nn.BCELoss(reduction=‘mean’)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

for epoch in range(1000):
y_pred = model(x_data)
loss = criterion(y_pred, y_data)
print(epoch, loss.data, type(loss.data))

optimizer.zero_grad()
loss.backward()
optimizer.step()

hour_var = torch.Tensor([[1.0]])
print(hour_var)

My question is that, when loss.backward() is called, how does python know it needs to look into model? criterion(y_pred, y_data) receives two tensors and call BCELoss.forward(). But how loss knows the calculation inside Model (e.g., Model.forward())? Is this related to the computational graph? I learned the basics, but I don’t know how the computational graph is implemented in this sample code. For example, does the tensor y_pred preserve the information about how y_pred is calculated?

Thanks in advance!

The computation graph is created during the forward pass dynamically and you could traverse it back using the .grad_fn of the output.
Based on this graph, the backward() call can backpropagate through all operations and calculate the gradients for all parameters.