Error while using NLLLoss2D

class convNet(nn.Module):
    #constructor
    def __init__(self):
        super(convNet, self).__init__()
        #defining layers in convnet
        #input size=1*657*1625
        self.conv1 = nn.Conv2d(1,16, kernel_size=3,stride=1,padding=1)
        self.conv2 = nn.Conv2d(16,32, kernel_size=3,stride=1,padding=1)
        self.conv3 = nn.Conv2d(32,64, kernel_size=3,stride=1,padding=1)
        self.conv4 = nn.Conv2d(64,64, kernel_size=3,stride=1,padding=1)
        self.conv5 = nn.Conv2d(32,16, kernel_size=3,stride=1,padding=1) 
    
        #Parallel rectangle and square convolution
        self.Pconv1=nn.Conv2d(64,32, kernel_size=(3,3),stride=1,padding=(1,1))
        self.Pconv2=nn.Conv2d(64,32, kernel_size=(3,7),stride=1,padding=(1,3))
        self.Pconv3=nn.Conv2d(64,32, kernel_size=(7,3),stride=1,padding=(3,1))
        
        #auxilary convolution
        
        self.conv6 = nn.Conv2d(16,8, kernel_size=3,stride=1,padding=1)
        self.conv7 = nn.Conv2d(8,1, kernel_size=3,stride=1,padding=1)
            
    def forward(self, x):
        x = nnFunctions.leaky_relu(self.conv1(x))
        x = nnFunctions.leaky_relu(self.conv2(x))
        x = nnFunctions.leaky_relu(self.conv3(x))
        x = nnFunctions.leaky_relu(self.conv4(x))
        x=nnFunctions.leaky_relu(self.Pconv1(x))+nnFunctions.leaky_relu(self.Pconv2(x))+nnFunctions.leaky_relu(self.Pconv3(x))
        x=nnFunctions.leaky_relu(self.conv5(x))
        x=nnFunctions.leaky_relu(self.conv6(x))
        
        x=nnFunctions.leaky_relu(self.conv7(x))
        return x

The above is my convNet class which takes input of 410x1x512x1024 dimension data and outputs a 410x1x512x1024 dimension data.
The data is 410 images grayscale so 1 channel and dimension of 512x1024.

Now I use a NLL loss function:
criterion = nn.NLLLoss2d()

And then I train the network using the following training function:

def train(train_loader,net,criterion,epochs,total_samples,learning_rate):
    prev_loss=0
    optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)
    
    for epoch in range(int(epochs)): # loop over the dataset multiple times
        running_loss = 0.0
        for i,data in enumerate(train_loader):
            inputs,labels=data
            # wrap them in Variable
            inputs, labels = Variable(inputs).cuda(), Variable(labels).cuda()
            
            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            
            loss = criterion(outputs, labels)
            loss.backward()        
            optimizer.step()
            if i==0:
                print loss
            # print statistics
            running_loss += loss.data[0]
        print(running_loss/total_samples)
    print('Finished Training')
    return net

net=train(train_loader,net,criterion,1,410,0.01)

But I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-25-e228ad25a2e4> in <module>()
----> 1 net=train(train_loader,net,criterion,1,410,0.01)

<ipython-input-23-15ac57a260e6> in train(train_loader, net, criterion, epochs, total_samples, learning_rate)
     16             outputs = net(inputs)
     17 
---> 18             loss = criterion(outputs, labels)
     19             loss.backward()
     20             optimizer.step()

/home/sarthak/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    208 
    209     def __call__(self, *input, **kwargs):
--> 210         result = self.forward(*input, **kwargs)
    211         for hook in self._forward_hooks.values():
    212             hook_result = hook(self, input, result)

/home/sarthak/anaconda2/lib/python2.7/site-packages/torch/nn/modules/loss.pyc in forward(self, input, target)
     21         _assert_no_grad(target)
     22         backend_fn = getattr(self._backend, type(self).__name__)
---> 23         return backend_fn(self.size_average)(input, target)
     24 
     25 

/home/sarthak/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.pyc in forward(self, input, target)
     39         output = input.new(1)
     40         getattr(self._backend, update_output.name)(self._backend.library_state, input, target,
---> 41             output, *self.additional_args)
     42         return output
     43 

TypeError: CudaSpatialClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, torch.cuda.FloatTensor, torch.cuda.FloatTensor, bool, NoneType, torch.cuda.FloatTensor), but expected (int state, torch.cuda.FloatTensor input, torch.cuda.LongTensor target, torch.cuda.FloatTensor output, bool sizeAverage, [torch.cuda.FloatTensor weights or None], torch.cuda.FloatTensor total_weight)

Please someone help.

did you try figuring out the error message, it seems somewhat informative. You are giving FloatTensor targets when it expects LongTensor targets

I corrected the above error but now I get another one saying :

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.7_1485444530918/work/torch/lib/THCUNN/generic/SpatialClassNLLCriterion.cu:17

I get what the error is, but I need to use the above function for my image labels : 410x1x512x1024 a 4D Vector.
Is there a workaround to do so.

Check out piwise. For context my target have shape batch x height x width with pixel values equal the class indices of the corresponding segmentation class. The output passed to NLLLoss2d()(output, target) have shape batch x classes x height x width where the classes dimension contains the probability map.

My model does work with cuda and cpu which should also fix your problem however it does not converge (need to find out why but this is not necessarily relevant for you).

1 Like

I’m using your code but encounter the same error. I’m not using your transform.py, using my own. Do you have any suggestion?

I’m seeing exactly the same error with @bodokaiser code (piwise). Has anyone found a solution?

Thanks

Maybe it is because the type of label is wrong, I’ve met with this problems 5min ago.