Zero_grad placement

jwillette · January 16, 2020, 3:38am

I came across some code on github where there was a strange placement of the call to zero_grad and I couldn’t explain why it was there or if there was any difference in placing it a few lines earlier.

github.com

tristandeleu/pytorch-meta/blob/master/examples/maml/train.py#L45


test_targets = test_targets.to(device=args.device)


outer_loss = torch.tensor(0., device=args.device)
accuracy = torch.tensor(0., device=args.device)
for task_idx, (train_input, train_target, test_input,
        test_target) in enumerate(zip(train_inputs, train_targets,
        test_inputs, test_targets)):
    train_logit = model(train_input)
    inner_loss = F.cross_entropy(train_logit, train_target)


    model.zero_grad()
    params = update_parameters(model, inner_loss,
        step_size=args.step_size, first_order=args.first_order)


    test_logit = model(test_input, params=params)
    outer_loss += F.cross_entropy(test_logit, test_target)


    with torch.no_grad():
        accuracy += get_accuracy(test_logit, test_target)


outer_loss.div_(args.batch_size)

I would expect this call to come at the beginning of the for loop before the model makes a prediction. Is there any different effect by putting it where it is in the link?

Eta_C · January 16, 2020, 4:08am

I think all is right, when zero_grad is executed before backward

jwillette · January 16, 2020, 5:07am

so just being anywhere before backward is the important part?

albanD · January 16, 2020, 2:52pm

If you code does not do weird stuff where it changes the gradients outside of the backward, you can put it anywhere in between two backward. You could even put it just after the backward if you want