I think there are two different way to modify gradients manually:
- use hook
- set weight.grad.data directly
demo scripts like this:
import torch
import torch.nn as nn
import pdb
batch_size = 8
class ModifyGrad(nn.Module):
def __init__(self):
super(ModifyGrad, self).__init__()
self.fc = nn.Linear(100, 10)
def forward(self, x):
return self.fc(x)
model = ModifyGrad()
optimizer = torch.optim.SGD(model.parameters(), 0.1, \
momentum=0.09, weight_decay=0.0001)
x = torch.randn(batch_size, 100)
out = model(x).mean()
# modify grad by hook
h = model.fc.weight.register_hook(lambda grad: grad+0.1 )
model.zero_grad()
out.backward()
optimizer.step()
# modify grad by set weight.grad.data directly
model.zero_grad()
out.backward()
model.fc.weight.grad.data = model.fc.weight.grad.data + 0.1
optimizer.step()
I wonder if above two methods is equivalent?
In addition, the hook-method seems is officially recommended. if above two methods is equivalent, what is the advantage of hook-method?