Confused About Optimizer Behavior in Custom Training Loop

pasiho · April 18, 2025, 5:30am

Hey everyone!

I’ve been experimenting with a custom training loop in PyTorch and stumbled upon a conceptual hurdle. I get the basics, but I want to understand better how the optimizer works during training.

Here’s the basic structure I’m using:

python
Copy
Edit
for data target in loader:
optimizer.zero_grad()
output = model(data)
loss = criterion(output target)
loss.backward()
optimizer.step()

What I’m curious about: does optimizer.step() right away update the model weights using the gradients calculated from loss.backward()? And what happens if I change gradients in between—will those changes stick?

I’ve come across some posts here and there but would appreciate insights from those who have built custom training logic in more complex workflows.
Also, I’m picking up knowledge about MLOps and automation. If you have any advice or suggestions for resources on how to become a DevOps engineer while having deep learning experience, I’d be grateful!

Thanks in advance for your help!
Regards
pasiho

ptrblck · April 18, 2025, 12:30pm

Yes, the step() method will use the .grad attributes of all trainable parameters which were passed to this optimizer to update them using the update formula (e.g. it could include momentum etc.).

Yes, you can manipulate the gradients after the backward and before the step() call which is also a common practice for e.g. gradient clipping.