I’m using the following two functions to find the accuracy of my semantic segmentation network, I found this code on github and they seem to work but I dont exactly know how. I am trying to understand what each line is doing.
I have commented each line with what I think is going on, if I am wrong in my understanding can you please correct me.
The lines I need help in understanding are: values,indices = tensor.cpu().max(1) and incorrect=preds.ne(targets).cpu().sum()
def get_predictions(output_batch):
bs, c, h, w = output_batch.size() # size returns [batchsize, channels, rows, columns]
tensor = output_batch.data
values, indices = tensor.cpu().max(1) # get the values and indices of the max values in every channel (dim=1), why are we finding the maximum value in RGB channels?
indices = indices.view(bs, h, w) # reshape it to this, as this is how 'targets' is shaped
return indices
def error(preds, targets):
assert preds.size() == targets.size()
bs, h, w = preds.size()
n_pixels = bs*h*w
incorrect = preds.ne(targets).cpu().sum() # I cannot find out what 'ne' is doing here and what are we summing?
err = incorrect.numpy()/n_pixels # converted this tensor to numpy as the tensor was int and division was giving 0 everytime
# return err
return round(err, 5)
Let’s walk through the code using your explanations:
def get_predictions(output_batch):
bs, c, h, w = output_batch.size() # size returns [batchsize, channels, rows, columns]
# Get's the underlying data. I would prefer to use .detach(), but that shouldn't be a problem here.
tensor = output_batch.data
# Gets the maximal value in every channel, right.
# As this will most likely be your model's prediction, you the channels correspond to the classes, i.e.
# channel0 represents the logits of class0. indices will therefore contain the predicted class for each pixel location.
values, indices = tensor.cpu().max(1) # get the values and indices of the max values in every channel (dim=1), why are we finding the maximum value in RGB channels?
# .squeeze() would probably do the same.
# Basically you want to get rid of dim1 which is a single channel now with the class predictions.
indices = indices.view(bs, h, w) # reshape it to this, as this is how 'targets' is shaped
return indices
def error(preds, targets):
assert preds.size() == targets.size()
bs, h, w = preds.size()
n_pixels = bs*h*w
# You are comparing the predictions of your model with the target tensor element-wise
# using the "not equal" operation. In other words, you'll bet a ByteTensor with 1s for all pixel locations,
# where the predictions do not equal the target. Summing it will give you the number of falsely predicted pixels.
incorrect = preds.ne(targets).cpu().sum() # I cannot find out what 'ne' is doing here and what are we summing?
# Divide the number of incorrectly classified pixel by the number of all pixels.
err = incorrect.numpy()/n_pixels # converted this tensor to numpy as the tensor was int and division was giving 0 everytime
# return err
return round(err, 5)