I have two networks which both output a tensor of shape batch_size x num_classes x height x width with num_classes = 256 (there are actually just 21 classes in VOC12 but they choose background to have label 255 - I will improve on this later).
So as the label has format batch_size x 1 x height x width I can calculate cross entropy loss with:
criterion = nn.CrossEntropyLoss()
# shape: batch_size x 1 x height x width = 1 x 1 x 256 x 256
inputs = autograd.Variable(images)
# shape: batch_size x 1 x height x width = 1 x 1 x 256 x 256
targets = autograd.Variable(labels)
# shape: batch_size x 1 x height x width = 1 x 256 x 256 x 256
outputs = model(inputs)
optimizer.zero_grad()
loss = criterion(outputs.view(-1, 256), targets.view(-1))
loss.backward()
optimizer.step()
I know that outputs[0][0][i][j| corresponds to the probability that the pixel at (i, j) belongs to class 1. So if want to transform outputs of shape 1 x 256 x 256 x 256 to 1 x 1 x 256 x 256 I would need to find the maximum (probability) of every pixel and assign it to the corresponding class value.
I could do this manually by iterating over every class and pixel with numpy but I wonder if there is any better way using tensor operations?
Note that some losses accept 2D inputs to them (and CrossEntropy will be updated soon as well to support it). So a more efficient way of computing the loss would be something like
nllcrit = nn.NLLLoss2d(size_average=True) # need to write a functional interface for it
def criterion(input, target):
return nllcrit(F.log_softmax(input), target)
Now, if you want to compute the confusion matrix for your predictions, you can use a variant of ConfusionMeter from tnt, but replacing the input by something like
@fmassa Does the F.log_softmax take care of the fact that we need to take softmax along the 1st axis?
For eg the input would be a tensor of size (batch_size, n_classes, H, W), we need to apply softmax along each n_classes slice for each pixel in the image sized h*w.
From what I read here, I’m not sure if F.log_softmax does that.
I’m currently permuting and resizing the outputs to get a slice with dimensions batch_size, h, w, n_classes and then directly using F.cross_entropy. Is this approach correct?