Zero_grad in network

pastrana1909 · February 6, 2021, 9:57pm

Imagine that I have a network class called myModel. I instantiated in the following fashion

net = mYModel()

In my training loop, I zero out the gradient before calling the forward pass.

for x, y in examples:
    net.zero_grad()
    loss = some_loss_function(net.forward(x), y)
    loss.backward()
    optimizer.step()

The previous code seems to work just fine. However, if instead of calling the forward pass, I invoke the call method, the gradients are not getting register. Look at the following code for reference:

for x, y in examples:
    net.zero_grad()
    loss = some_loss_function(net(x), y)
    loss.backward()
    optimizer.step()

The second code does not work and the backpropagation is happening with a zero gradient, so there is not a change in the network params. It is worth mentioning that I am training only one model and the optimizer is referenced to the network params.

Is this behavior intended?

ptrblck · February 7, 2021, 6:26am

That’s quite strange, as the second approach is the correct one.
It’s usually not recommended to call the forward method directly, as it would skip all registered hooks.
However, could it be the case that you are using forward hooks, which might somehow delete the gradients?
If not, could you post the model definition, so that we could debug the issue?