module.weight.grad.data.AttributeError: 'NoneType' object has no attribute 'data'

I wanto update BatchNorm2D after loss.backward.But this problem cannot be solved.Code show as below.
def updateBN():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(0.001*torch.sign(m.weight.data)) # L1

train:
model.zero_grad()
output= model(input)
loss = nn.BCEloss(output,target)
loss.backward()
updateBN()
optim_m.step()

It seems the .grad attribute wasn’t populated so you might have accidentally detached some tensors from the computation graph.
Could you check the .grad attribute of other layers and make sure you see valid values?
Also, don’t use the .data attribute, as it may yield unwanted side effects.

Alternatively to your current workflow you could also use hooks via model.bn.weight.register_hook to manipulate the gradients, but this wouldn’t solve your current issue.

Thanks.There are no gradients for specific layers.I’m trying to solve it.I am a newcomer in deeplearning and try to reproduce the results of others.

I don’t quite understand the use case. You are currently trying to manipulate the gradient of the batchnorm weight parameter, which wasn’t calculated.
Could you explain your use case a bit?

Sorry.I have been busy doing experiments recently and have not seen your reply in time.The paper is .Code is used ad sparse training.The weight of BatchNorm2D is uesd as Penalty factor of sparse training.The following code is used to update the weight of batchnorm2d.

def updateBN():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            m.weight.grad.data.add_(args.s*torch.sign(m.weight.data))  # L1
def BN_grad_zero():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            mask = (m.weight.data != 0)
            mask = mask.float().cuda()
            m.weight.grad.data.mul_(mask)
            m.bias.grad.data.mul_(mask)
def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        pred = output.data.max(1, keepdim=True)[1]
        loss.backward()
        if args.sr:
            updateBN()
        BN_grad_zero()
        optimizer.step()

This dummy code snippet seems to work, if I use updateBN and BN_grad_zero after the backward call:

def updateBN():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            m.weight.grad.add_(torch.sign(m.weight)) 

def BN_grad_zero():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            mask = (m.weight != 0)
            mask = mask.float()
            m.weight.grad.mul_(mask)
            m.bias.grad.mul_(mask)
            

model = nn.Sequential(
    nn.Conv2d(1, 3, 3, 1, 1),
    nn.BatchNorm2d(3))

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
optimizer.zero_grad()
output = model(torch.randn(1, 1, 24, 24))
loss = F.cross_entropy(output, torch.randint(0, 3, (1, 24, 24)))
pred = output.data.max(1, keepdim=True)[1]
loss.backward()

updateBN()
BN_grad_zero()
optimizer.step()

PS: I’ve also removed the .data attribute, as we don’t recommend to use it, since it might yield unwanted side effects.

Thank you.I have optimized my code according to your suggestion.Maybe I should learn updated pytorch.

Hope Pytorch will have a better promotion. Tensorflow’s code looks too painful.