module.weight.grad.data.AttributeError: 'NoneType' object has no attribute 'data'

lkchenxicvi · May 25, 2020, 2:50am

I wanto update BatchNorm2D after loss.backward.But this problem cannot be solved.Code show as below.
def updateBN():
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(0.001*torch.sign(m.weight.data)) # L1

train:
model.zero_grad()
output= model(input)
loss = nn.BCEloss(output,target)
loss.backward()
updateBN()
optim_m.step()

ptrblck · May 25, 2020, 3:18am

It seems the .grad attribute wasn’t populated so you might have accidentally detached some tensors from the computation graph.
Could you check the .grad attribute of other layers and make sure you see valid values?
Also, don’t use the .data attribute, as it may yield unwanted side effects.

Alternatively to your current workflow you could also use hooks via model.bn.weight.register_hook to manipulate the gradients, but this wouldn’t solve your current issue.

lkchenxicvi · May 25, 2020, 7:49am

Thanks.There are no gradients for specific layers.I’m trying to solve it.I am a newcomer in deeplearning and try to reproduce the results of others.

ptrblck · May 26, 2020, 7:01am

I don’t quite understand the use case. You are currently trying to manipulate the gradient of the batchnorm weight parameter, which wasn’t calculated.
Could you explain your use case a bit?

lkchenxicvi · May 28, 2020, 7:27am

Sorry.I have been busy doing experiments recently and have not seen your reply in time.The paper is .Code is used ad sparse training.The weight of BatchNorm2D is uesd as Penalty factor of sparse training.The following code is used to update the weight of batchnorm2d.

def updateBN():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            m.weight.grad.data.add_(args.s*torch.sign(m.weight.data))  # L1

def BN_grad_zero():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            mask = (m.weight.data != 0)
            mask = mask.float().cuda()
            m.weight.grad.data.mul_(mask)
            m.bias.grad.data.mul_(mask)

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        pred = output.data.max(1, keepdim=True)[1]
        loss.backward()
        if args.sr:
            updateBN()
        BN_grad_zero()
        optimizer.step()

ptrblck · May 28, 2020, 8:17pm

This dummy code snippet seems to work, if I use updateBN and BN_grad_zero after the backward call:

def updateBN():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            m.weight.grad.add_(torch.sign(m.weight)) 

def BN_grad_zero():
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            mask = (m.weight != 0)
            mask = mask.float()
            m.weight.grad.mul_(mask)
            m.bias.grad.mul_(mask)
            

model = nn.Sequential(
    nn.Conv2d(1, 3, 3, 1, 1),
    nn.BatchNorm2d(3))

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
optimizer.zero_grad()
output = model(torch.randn(1, 1, 24, 24))
loss = F.cross_entropy(output, torch.randint(0, 3, (1, 24, 24)))
pred = output.data.max(1, keepdim=True)[1]
loss.backward()

updateBN()
BN_grad_zero()
optimizer.step()

PS: I’ve also removed the .data attribute, as we don’t recommend to use it, since it might yield unwanted side effects.

lkchenxicvi · May 30, 2020, 2:34am

Thank you.I have optimized my code according to your suggestion.Maybe I should learn updated pytorch.

lkchenxicvi · May 30, 2020, 2:38am

Hope Pytorch will have a better promotion. Tensorflow’s code looks too painful.