I wanto update BatchNorm2D after loss.backward.But this problem cannot be solved.Code show as below.
def updateBN():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(0.001*torch.sign(m.weight.data)) # L1
train:
model.zero_grad()
output= model(input)
loss = nn.BCEloss(output,target)
loss.backward()
updateBN()
optim_m.step()
It seems the .grad
attribute wasn’t populated so you might have accidentally detached some tensors from the computation graph.
Could you check the .grad
attribute of other layers and make sure you see valid values?
Also, don’t use the .data
attribute, as it may yield unwanted side effects.
Alternatively to your current workflow you could also use hooks via model.bn.weight.register_hook
to manipulate the gradients, but this wouldn’t solve your current issue.
Thanks.There are no gradients for specific layers.I’m trying to solve it.I am a newcomer in deeplearning and try to reproduce the results of others.
I don’t quite understand the use case. You are currently trying to manipulate the gradient of the batchnorm weight
parameter, which wasn’t calculated.
Could you explain your use case a bit?
Sorry.I have been busy doing experiments recently and have not seen your reply in time.The paper is .Code is used ad sparse training.The weight of BatchNorm2D is uesd as Penalty factor of sparse training.The following code is used to update the weight of batchnorm2d.
def updateBN():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(args.s*torch.sign(m.weight.data)) # L1
def BN_grad_zero():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
mask = (m.weight.data != 0)
mask = mask.float().cuda()
m.weight.grad.data.mul_(mask)
m.bias.grad.data.mul_(mask)
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
pred = output.data.max(1, keepdim=True)[1]
loss.backward()
if args.sr:
updateBN()
BN_grad_zero()
optimizer.step()
This dummy code snippet seems to work, if I use updateBN
and BN_grad_zero
after the backward
call:
def updateBN():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.add_(torch.sign(m.weight))
def BN_grad_zero():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
mask = (m.weight != 0)
mask = mask.float()
m.weight.grad.mul_(mask)
m.bias.grad.mul_(mask)
model = nn.Sequential(
nn.Conv2d(1, 3, 3, 1, 1),
nn.BatchNorm2d(3))
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
optimizer.zero_grad()
output = model(torch.randn(1, 1, 24, 24))
loss = F.cross_entropy(output, torch.randint(0, 3, (1, 24, 24)))
pred = output.data.max(1, keepdim=True)[1]
loss.backward()
updateBN()
BN_grad_zero()
optimizer.step()
PS: I’ve also removed the .data
attribute, as we don’t recommend to use it, since it might yield unwanted side effects.
Thank you.I have optimized my code according to your suggestion.Maybe I should learn updated pytorch.
Hope Pytorch will have a better promotion. Tensorflow’s code looks too painful.