AttributeError: 'NoneType' object has no attribute 'data'

I build a custom layer in a traditional CNN, However, when I try to train the new neural network model, the error is throw out:

AttributeError: ‘NoneType’ object has no attribute ‘data’

I guess it cause by my custom layer and it may set up some weight is not trainable?

I am not quite sure what is going on. I really appreciate that someone can provide some suggestions.

Thank you very much

Can you add some context ? What part of your code causes this error ?

My code is a little messy. I attached here…

class DFR(torch.nn.Module):
def init(self, in_features, out_features, bias=False, name =‘DFR’ ):
super(DFR, self).init()

      self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
      if bias:
           self.bias = nn.Parameter(torch.Tensor(out_features))
      else:
           self.register_parameter('bias', None)

      self.weight.data = torch.zeros(in_features, out_features, device=torch.device('cuda'))
      self.weight.require_grad = True

def forward(self, input):

      # the recevied the input includes two parts: (1) the input (2) the delay_res value
      r = torch.chunk(input, 2)
      delay_res_local = r[1]
      input = r[0]

      input = F.relu(input)
      num = torch.numel(input)  # the element number of the input tensor
      input = input.view(num)

      delay_res_local = delay_res_local.view(num)
      delay_res_local = torch.roll(delay_res_local,1,-1)
      delay_res_local = 0.1 * delay_res_local + input


      delay_res_local = delay_res_local
      h1 = delay_res_local.view(200, 512, 4, 4)
      output = F.relu (h1)

      return output, delay_res_local

in the training parts, the error occurs at the moment of trying accessing “param.grad.data”. Then the error is

param.grad.data = wage_quantizer.QG(param.grad.data,args.wl_grad,grad_scale)
AttributeError: ‘NoneType’ object has no attribute ‘data’

Thank you very much

Sorry I am not really good with anything autograd-related … You’ll have to wait for someone else to answer.

(Maybe the case when bias==False is not properly handled somewhere?
But I don’t really see where self.weight and self.bias are used, in your code)

1 Like

You are not using self.weight or any other parameters in your forward method.
If you pass a simple tensor into this module, you won’t be able to call backward on the two outputs.

Also, the usage of .data is not recommended, so if you want to manipulate a parameter, please refer to this post.

PS: You also have a typo: you are using require_grad instead of requires_grad (which won’t fix the issue however).

Dear ptrblck. Thanks your reply. I fix the problem by using weights. This is a stupid mistake. But I still do not very clear about this. So if I use the weights in forward method, then the Pytorch would include the weights in backward propagation and update their values?

Yes. The forward pass will create the computation graph, which will be used for backpropagation to compute the gradients for all parameters which require gradients.
If you don’t use any such parameters, .backward() will raise an error.

1 Like

OK thank you very much!

Hi guys.
I am going through a tutorial of NN. I am stacked on a very similar thing. could you look at it? i controlled my code like 10x with other tutorial codes that are already working but still it returns this error:

w.grad.data.zero_()
AttributeError: 'NoneType' object has no attribute 'data'

I would apreciate any cluu to what am i missing here. thanks a lot!

class Data(Dataset):
        def __init__(self,X):
            self.x = torch.arange(-3,3,0.1).view(-1,1)
            #self.y = 1 * X - 1
            self.f = 1 * X - 1
            self.y = self.f + 0.1*torch.randn(self.x.size())
            self.len = self.x.shape[0]
            
        def __getitem__(self, index):
            return self.x[index], self.y[index]
        
        def __len__(self):
            return self.len
        
    def forward(x):
        return w*x + b
    
    def criterion(yhat,y):
        return torch.mean((yhat - y)**2)
    
    w = torch.tensor(-10.0, requires_grad=True)                    
    b = torch.tensor(-10.0, requires_grad=True)
    #init input for loss function testing
    X = torch.arange(-3, 3, 0.1).view(-1, 1)
    #prepare input data
    dataset = Data(X)        
    trainloader = DataLoader(dataset = dataset, batch_size = 4)
    
    lr = 0.1
    for epoch in range(4):              # learning process x-times
        for x,y in trainloader:             #   for every minibach do:
            #print(x,y)
            yhat = forward(x)           #     compute current output of function
            loss = criterion(yhat,y)    #     compute difference between current function and wanted function
            loss.backward()             #     examine slope of the loss function
            w = w.data - lr*w.grad.data #     prepare parameters for next model
            b = b.data - lr*b.grad.data
            w.grad.data.zero_()
            b.grad.data.zero_()
1 Like

hey, I have same error.

AttributeError Traceback (most recent call last)
in
35 device=device,
36 img_scale=0.5,
—> 37 val_percent=0.1)
38
39 except KeyboardInterrupt:

in train_net(net, device, epochs, batch_size, lr, img_scale, val_percent, save_cp)
87 tag = tag.replace(’.’, ‘/’)
88 writer.add_histogram(‘weights/’ + tag, value.data.cpu().numpy(), global_step)
—> 89 writer.add_histogram(‘grads/’ + tag, value.grad.data.cpu().numpy(), global_step)
90 val_score = eval_net(net, val_loader, device)
91 scheduler.step(val_score)

AttributeError: ‘NoneType’ object has no attribute ‘data’

In my model, I used nn.Parameter to initialize weight and bias. According to your explanation here, self.weight or any other parameters should be used in the forward method. Could you please explain, how would this work in my case since it leads to the solution of error? To my understanding, there is no necessary(way) to use them in forward method. But, maybe I am wrong about it since I am a new hand to pytorch. Thank u.

class ResDown(nn.Module):
“”“Downscaling “””

def __init__(self, 
             in_channels, 
             out_channels, 
             scale=4 ): 
    super().__init__()
    self.scale = scale

    self.down = nn.Sequential(
        nn.ZeroPad2d(1),
        nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=scale, stride=scale),
        nn.LeakyReLU(inplace=True) )
    self.down.weight = nn.Parameter(
        nn.init.normal_(torch.empty((out_channels, in_channels,scale,scale),dtype=torch.float32)))  
    self.down.weight.requires_grad = True
   
def forward(self, x):
    #print('input feature size',x.shape)
    x = self.down(x)
    #print('before partial padding:output of first convertional layer',x.shape)
    in_c,out_c,h,w = x.shape
    #print('the mask_ratio',partial_padding(3,h,w).shape)
    x = x*partial_padding(3,h,w) # kernel_size = 4 instead of self.scale
    #print('the output of down sampling',x.shape)
    return x

I’m not sure how you are using the code, which yields the error message, but I guess you are trying to print the gradients of model.down.weight.
If that’s the case, not that you are creating a new parameter as self.down.weight, while it seems you would like to replace the filter kernels from the conv layer.
If my guess is right, you should use self.down[1].weight = nn.Parameter(...).

Thanks for reply. Yes, you’re right. I’m trying to replace the filter kernels from the conv layer. Earlier, I had no zero padding before conv layer. This way of replacing filter kernels worked correctly. But, the error still exists even after I corrected filter kernel replacement. I don’t know, where else could be wrong. By the way, I had this AttributeError: ‘NoneType’ object has no attribute ‘data’, after running the train for a while, like this: Epoch 1/5: 1%| | 50/4580 [00:10<15:18, 4.93img/s, loss (batch)=0.693]. My only guess is here, for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal(m.weight)
nn.init.constant(m.bias, 0)
I have a few exact same conv layers wrapped with nn.Sequential. I tested before, the replacement of filter kernel works.

I’m not sure where the error comes from, as your code seems to work with the correction:

class ResDown(nn.Module):
    def __init__(self, 
                 in_channels, 
                 out_channels,
                 scale=4): 
        super().__init__()
        self.down = nn.Sequential(
            nn.ZeroPad2d(1),
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=scale, stride=scale),
            nn.LeakyReLU(inplace=True) )
        self.down[1].weight = nn.Parameter(
            nn.init.normal_(torch.empty((out_channels, in_channels,scale,scale),dtype=torch.float32)))  
       
    def forward(self, x):
        x = self.down(x)       
        return x 

model = ResDown(1, 1)
x = torch.randn(1, 1, 24, 24)
out = model(x)
out.mean().backward()

for name, param in model.named_parameters():
    print(name, param.grad.abs().sum())
> down.1.weight tensor(1.8962)
down.1.bias tensor(0.5600)

Where are you trying to print the gradients? Did you make sure to call backward before printing them?

Sorry, I have to post this long code. Yes, I called loss.backward. This code, at least train part, is from open source. It should be correct.

dir_img = ‘./imgs/’
dir_mask = ‘./masks/’

def train_net(net,
device,
epochs=5,
batch_size=1,
lr=0.001,
img_scale=0.5,
val_percent=0.1,
save_cp=True):

dataset = BasicDataset(dir_img, dir_mask, img_scale)

n_val = int(len(dataset) * 0.1)
n_train = len(dataset) - n_val

train, val = random_split(dataset, [n_train, n_val])

train_loader = DataLoader(train, batch_size=1, shuffle=True, num_workers=0, pin_memory=False)
val_loader = DataLoader(val, batch_size=1, shuffle=False, num_workers=0, pin_memory=False, drop_last=True)

writer = SummaryWriter(comment=f'LR_{lr}_BS_{batch_size}')
global_step = 0

logging.info(f'''Starting training:
    Epochs:          {epochs}
    Batch size:      {batch_size}
    Learning rate:   {lr}
    Training size:   {n_train}
    Validation size: {n_val}
    Checkpoints:     {save_cp}
    Device:          {device.type}
''')

optimizer = optim.RMSprop(net.parameters(), lr=lr, weight_decay=1e-8, momentum=0.9)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min' if net.n_classes > 1 else 'max', patience=2)


if net.n_classes > 1:
    criterion = nn.CrossEntropyLoss()
else:
    criterion = nn.BCEWithLogitsLoss()


for epoch in range(epochs):
    net.train()
    
    epoch_loss = 0
    with tqdm(total=n_train, desc=f'Epoch {epoch + 1}/{epochs}', unit='img') as pbar:
        for batch in train_loader:
            imgs = batch['image']
            true_masks = batch['mask']


            imgs = imgs.to(device=device, dtype=torch.float32)
            mask_type = torch.float32 if net.n_classes == 1 else torch.long
            true_masks = true_masks.to(device=device, dtype=mask_type)

            logits,probs,masks_pred = net(imgs) #logits, probas, preds
            
            masks_pred = masks_pred.double() # converting tensor type from long to double only works in train
            masks_pred = masks_pred.squeeze()
            #print('masks_pred',masks_pred.shape)
            
            true_masks = true_masks.squeeze()
            #labels = np.argmax(labels,axis=1)
            
            #print('true_masks',true_masks.shape)
            loss = criterion(masks_pred, true_masks)
            loss.requires_grad = True 
            #loss = Variable(loss, requires_grad = True)
            epoch_loss += loss.item()
            writer.add_scalar('Loss/train', loss.item(), global_step)

            pbar.set_postfix(**{'loss (batch)': loss.item()})

            optimizer.zero_grad()
            loss.backward()
            nn.utils.clip_grad_value_(net.parameters(), 0.1)
            optimizer.step()

            pbar.update(imgs.shape[0])
            global_step += 1
            if global_step % (len(dataset) // (10 * batch_size)) == 0:
                for tag, value in net.named_parameters():
                    tag = tag.replace('.', '/')
                    writer.add_histogram('weights/' + tag, value.data.cpu().numpy(), global_step)
                    writer.add_histogram('grads/' + tag, value.grad.data.cpu().numpy(), global_step)
                val_score = eval_net(net, val_loader, device)
                scheduler.step(val_score)
                writer.add_scalar('learning_rate', optimizer.param_groups[0]['lr'], global_step)

                if net.n_classes > 1:
                    logging.info('Validation cross entropy: {}'.format(val_score))
                    writer.add_scalar('Loss/test', val_score, global_step)
                else:
                    logging.info('Validation Dice Coeff: {}'.format(val_score))
                    writer.add_scalar('Dice/test', val_score, global_step)

                writer.add_images('images', imgs, global_step)
                if net.n_classes == 1:
                    writer.add_images('masks/true', true_masks, global_step)
                    writer.add_images('masks/pred', torch.sigmoid(masks_pred) > 0.5, global_step)

    if save_cp:
        try:
            os.mkdir(dir_checkpoint)
            logging.info('Created checkpoint directory')
        except OSError:
            pass
        torch.save(net.state_dict(),
                   dir_checkpoint + f'CP_epoch{epoch + 1}.pth')
        logging.info(f'Checkpoint {epoch + 1} saved !')

writer.close()

The code looks generally OK.
What happens, if you remove loss.requires_grad = True? You should generally only be able to change the requires_grad attribute of leaf variables. Does the loss tensor contain a valid grad_fn?

Thank u very much. Yes, after commenting ‘loss.requires_grad = True’, I found out the real error, which is ‘element 0 of tensors does not require grad and does not have a grad_fn’. I had this error before, and added 'loss.requires_grad = True ’ naively. I compared it to the original code again and found out that there is no ‘loss.requires_grad = True’, just

loss = criterion(masks_pred, true_masks)
epoch_loss += loss.item()

I guess, this refers to what you meant, generally only be able to change the requires_grad attribute of leaf variables. So, the problem is that I put wrong output inside the loss function. My new network architecture delivers three output instead of just one as what the original network architecture did. Now, I have new error. But, I am going to right direction. Many thanks. @ptrblck

Did you solve the new error?
It usually means that you’ve detached the computation graph at some point.
Note that .item() would be such an operation, but if loss doesn’t contain a grad_fn then it was already detached before.
Let me know, if you get stuck.

yes, the new errors are already being solved. Thanks again. @ptrblck It’s an interesting fact that ‘‘loss’’ is detached somewhere before and not obvious to be seen.

I have met a similar problem in my code when I’m running my main file, here below is the error window rised:

  File "D:\Code\Pconv\utils.py", line 72, in clip_gradient
    param.grad.data.clamp_(-grad_clip, grad_clip)
AttributeError: 'NoneType' object has no attribute 'data'

That’s from a function I have defined where is:

def clip_gradient(optimizer, grad_clip):
    for group in optimizer.param_groups:
        for param in group['params']:
            param.grad.data.clamp_(-grad_clip, grad_clip)
            print (param.grad)

Here is the network(model) I used from [1]:

def sigmoid(x):
    return 1 / (1 + math.exp(-x))


def norm_angle(angle):
    norm_angle = sigmoid(10 * (abs(angle) / 0.7853975 - 1))
    return norm_angle


def conv3x3(in_planes, out_planes, stride=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU()
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU()
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out = out + residual
        out = self.relu(out)

        return out

###''' self-attention; relation-attention '''

class ResNet_AT(nn.Module):
    def __init__(self, block, layers, num_classes=7, end2end=True, at_type='self-attention'):
        self.inplanes = 64
        self.end2end = end2end
        super(ResNet_AT, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.dropout = nn.Dropout(0.5)
        self.dropout2 = nn.Dropout(0.6)
        self.alpha = nn.Sequential(nn.Linear(512, 1),
                                   nn.Sigmoid())

        self.beta = nn.Sequential(nn.Linear(1024, 1),
                                  nn.Sigmoid())

        self.pred_fc1 = nn.Linear(512, 7)
        self.pred_fc2 = nn.Linear(1024, 7)
        self.at_type = at_type

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x='', phrase='train', AT_level='first_level',vectors='',vm='',alphas_from1='',index_matrix=''):
        global pred_score
        vs = []
        alphas = []

        assert phrase == 'train' or phrase == 'eval'
        assert AT_level == 'first_level' or AT_level == 'second_level' or AT_level == 'pred'
        if phrase == 'train':
            f = x[:, :, :, :]

            f = self.conv1(f)
            f = self.bn1(f)
            f = self.relu(f)
            f = self.maxpool(f)

            f = self.layer1(f)
            f = self.layer2(f)
            f = self.layer3(f)
            f = self.layer4(f)
            f = self.avgpool(f)

            f = f.squeeze(3).squeeze(2)  # f[1, 512, 1, 1] ---> f[1, 512]

            # MN_MODEL(first Level)
            vs.append(f)
            alphas.append(self.alpha(self.dropout(f)))

            vs_stack = torch.stack(vs, dim=2)
            alphas_stack = torch.stack(alphas, dim=2)

            if self.at_type == 'self-attention':
                vm1 = vs_stack.mul(alphas_stack).sum(2).div(alphas_stack.sum(2))
            if self.at_type == 'self_relation-attention':
                vm1 = vs_stack.mul(alphas_stack).sum(2).div(alphas_stack.sum(2))
                betas = []
                for i in range(len(vs)):
                    vs[i] = torch.cat([vs[i], vm1], dim=1)
                    betas.append(self.beta(self.dropout(vs[i])))

                cascadeVs_stack = torch.stack(vs, dim=2)
                betas_stack = torch.stack(betas, dim=2)
                output = cascadeVs_stack.mul(betas_stack * alphas_stack).sum(2).div((betas_stack * alphas_stack).sum(2))

            if self.at_type == 'self-attention':
                vm1 = self.dropout(vm1)
                pred_score = self.pred_fc1(vm1)

            if self.at_type == 'self_relation-attention':
                output = self.dropout2(output)
                pred_score = self.pred_fc2(output)

            return pred_score

        if phrase == 'eval':
            if AT_level == 'first_level':
                f = self.conv1(x)
                f = self.bn1(f)
                f = self.relu(f)
                f = self.maxpool(f)

                f = self.layer1(f)
                f = self.layer2(f)
                f = self.layer3(f)
                f = self.layer4(f)
                f = self.avgpool(f)

                f = f.squeeze(3).squeeze(2)  # f[1, 512, 1, 1] ---> f[1, 512]
                # MN_MODEL(first Level)
                alphas = self.alpha(self.dropout(f))

                return f, alphas

            if AT_level == 'second_level':
                assert self.at_type == 'self_relation-attention'
                vms = index_matrix.permute(1, 0).mm(vm)  # [381, 21783] -> [21783,381] * [381,512] --> [21783, 512]
                vs_cate = torch.cat([vectors, vms], dim=1)

                betas = self.beta(self.dropout(vs_cate))
                ''' keywords: mean_fc ; weight_sourcefc; sum_alpha; weightmean_sourcefc '''
                ''' alpha * beta '''
                weight_catefc = vs_cate.mul(alphas_from1)  # [21570,512] * [21570,1] --->[21570,512]
                alpha_beta = alphas_from1.mul(betas)
                sum_alphabetas = index_matrix.mm(alpha_beta)  # [380,21570] * [21570,1] -> [380,1]
                weightmean_catefc = index_matrix.mm(weight_catefc).div(sum_alphabetas)

                weightmean_catefc = self.dropout2(weightmean_catefc)
                pred_score = self.pred_fc2(weightmean_catefc)

                return pred_score

            if AT_level == 'pred':
                if self.at_type == 'self-attention':
                    pred_score = self.pred_fc1(self.dropout(vm))

                return pred_score

''' self-attention; relation-attention '''
def resnet18_FAN(pretrained=False, **kwargs):
    # Constructs base a ResNet-18 model.
    model = ResNet_AT(BasicBlock, [2, 2, 2, 2], **kwargs)
    return model

I think it should be some layers in my network is not used, but I cannot figure out which layers they are, any helps are very appreciated!

[1] The original paper named “FRAME ATTENTION NETWORKS FOR FACIAL EXPRESSION RECOGNITION IN VIDEOS”, here is the link: https://arxiv.org/pdf/1907.00193.pdf.

You could print the name and the gradient of all parameters after the backward call and check which parameters do not receive a valid gradient via:

for name, param in model.named_parameters():
    print(name, param.grad)

Some (or all) of these parameters should return a None gradient and based on the parameter name you could check, if these parameters are indeed not used in the model or if you’ve accidentally detached the computation graph.

1 Like