[solved] Fast loss function for fully convolutional network

Hi all, I’m making a network which predict objects in output grids so that the output shape is [class_size, grid_width, grid_height]. Then now the loss function is like this. output is [batch_size, class_size, grid_width, grid_height] and target [1, grid_width, grid_width] contains class labels if there is an object in a grid otherwise 0.

for b, w, h in torch.nonzero(target.data):
    loss += F.cross_entropy(output[b, :, w, h].contiguous().view(1, -1), target[b, w, h])

As you may think, this loss function is very slow and the bottleneck so that I’d like to change if possible.

The reason why the target is tensor is I use the DataLoader and it does not accept some objects, for example. list which store grid and corresponding class label like list(dict(grid=(x,y), cls=c). dict(grid=(x,y), cls=c)...) for each image.

Does anyone have good idea for improvement the performance? Thank you for advance.

Hello @moskomule,

Would the following two steps achieve what you want:

  • Initialize CrossEntropyLoss with a weight of ones except the first class (which gets weight 0). So if you use cuda
loss_class_weights = torch.ones(num_classes).cuda()
loss_class_weights[0] = 0.0
ce_loss = torch.nn.CrossEntropyLoss(weight=loss_class_weights)
  • Use loss = ce_loss(output.transpose(0,2,3,1)).continguous().view(-1,1), target.view(-1))

The only thing you would need to take care of is the scaling of the loss - I think that by default it would now average over target.size() instead of (target!=0).sum() - so you would need to scale by the quotient.

Best regards


Edit: The transpose above is incorrect, I had numpy’s transpose in mind. See below for what I actually wanted.

Thank you Thomas.

I’m afraid my explanation was not enough and so confused you. Suppose grid_size=3 and only one object whose label is 3 is in grid [1,2] then the target is


The cross entropy loss is ce(output[:, 1, 2], [3]). I used nonzero to get all the indices of object existing grid(s) but that was a bit slow

Indeed, you confused me.
My impression was that you had data similar to

scores = Variable(torch.randn(2,4,3,3))
targets = torch.zeros(1,3,3).long()
targets[:,2,1] = 3
targets = Variable(targets)
targets = targets.expand(scores.size(0),*targets.size()[1:]) # if you only have one target per batch

and wanted

ce_fn_1 = torch.nn.CrossEntropyLoss()
ce_fn_1(scores[:,:,2,1], targets[:,2,1])

which is very similar to

ce_fn_2 = torch.nn.CrossEntropyLoss(weight=torch.Tensor([0.0,1.0,1.0,1.0]))
ce_fn_2(scores.transpose(1,2).transpose(2,3).contiguous().view(-1, scores.size(1)), 

Best regards


Thank you Thomas, and I have a question for your solution. I don’t know why you put weight in CrossEntropyLoss, though it may be trivial. -> finally I understand 0 is the background so that you need to ignore. Thank you again, it solved.

I’ve just noticed that the order is [batch, channel, height, weight]