If two different way to modify grad manually is equivalent?

I think there are two different way to modify gradients manually:

  1. use hook
  2. set weight.grad.data directly

demo scripts like this:

import torch
import torch.nn as nn
import pdb
batch_size = 8

class ModifyGrad(nn.Module):
    def __init__(self):
        super(ModifyGrad, self).__init__()
        self.fc = nn.Linear(100, 10)

    def forward(self, x):
        return self.fc(x)


model = ModifyGrad()
optimizer = torch.optim.SGD(model.parameters(), 0.1, \
    momentum=0.09, weight_decay=0.0001)

x = torch.randn(batch_size, 100)
out = model(x).mean()


# modify grad by hook
h = model.fc.weight.register_hook(lambda grad: grad+0.1 )
model.zero_grad()
out.backward()
optimizer.step()

# modify grad by set weight.grad.data directly
model.zero_grad()
out.backward()
model.fc.weight.grad.data = model.fc.weight.grad.data + 0.1
optimizer.step()

I wonder if above two methods is equivalent?

In addition, the hook-method seems is officially recommended. if above two methods is equivalent, what is the advantage of hook-method?

Use the hook approach, as the manipulation of the .data attribute is not recommended and might yield unwanted side effects.

Can you give me some detail explanation about the side effects?

At least in my example above, in two different ways, optimzer.step will yield same update on model.fc.weight

@albanD explains it here better than I could. :slight_smile: