Trying to understand this small piece of code

farazk86 · October 25, 2018, 7:15pm

Hi,

I’m using the following two functions to find the accuracy of my semantic segmentation network, I found this code on github and they seem to work but I dont exactly know how. I am trying to understand what each line is doing.

I have commented each line with what I think is going on, if I am wrong in my understanding can you please correct me.

The lines I need help in understanding are: values,indices = tensor.cpu().max(1) and incorrect=preds.ne(targets).cpu().sum()

def get_predictions(output_batch):
    bs, c, h, w = output_batch.size()           # size returns [batchsize, channels, rows, columns]
    tensor = output_batch.data
    values, indices = tensor.cpu().max(1)       # get the values and indices of the max values in every channel (dim=1),  why are we finding the maximum value in RGB channels? 
    indices = indices.view(bs, h, w)            # reshape it to this, as this is how 'targets' is shaped
    return indices


def error(preds, targets):
    assert preds.size() == targets.size()
    bs, h, w = preds.size()
    n_pixels = bs*h*w
    incorrect = preds.ne(targets).cpu().sum()       # I cannot find out what 'ne' is doing here and what are we summing?
    err = incorrect.numpy()/n_pixels                # converted this tensor to numpy as the tensor was int and division was giving 0 everytime
    # return err
    return round(err, 5)

Many Thanks

ptrblck · October 25, 2018, 7:39pm

Let’s walk through the code using your explanations:

def get_predictions(output_batch):
    bs, c, h, w = output_batch.size()           # size returns [batchsize, channels, rows, columns]
    # Get's the underlying data. I would prefer to use .detach(), but that shouldn't be a problem here.
    tensor = output_batch.data
    # Gets the maximal value in every channel, right. 
    # As this will most likely be your model's prediction, you the channels correspond to the classes, i.e.
    # channel0 represents the logits of class0. indices will therefore contain the predicted class for each pixel location.
    values, indices = tensor.cpu().max(1)       # get the values and indices of the max values in every channel (dim=1),  why are we finding the maximum value in RGB channels? 
    # .squeeze() would probably do the same. 
    # Basically you want to get rid of dim1 which is a single channel now with the class predictions.
    indices = indices.view(bs, h, w)            # reshape it to this, as this is how 'targets' is shaped
    return indices


def error(preds, targets):
    assert preds.size() == targets.size()
    bs, h, w = preds.size()
    n_pixels = bs*h*w
    # You are comparing the predictions of your model with the target tensor element-wise 
    # using the "not equal" operation. In other words, you'll bet a ByteTensor with 1s for all pixel locations, 
    # where the predictions do not equal the target. Summing it will give you the number of falsely predicted pixels.
    incorrect = preds.ne(targets).cpu().sum()       # I cannot find out what 'ne' is doing here and what are we summing?
    # Divide the number of incorrectly classified pixel by the number of all pixels.
    err = incorrect.numpy()/n_pixels                # converted this tensor to numpy as the tensor was int and division was giving 0 everytime
    # return err
    return round(err, 5)

Let me know, if some aspects are still unclear.

farazk86 · October 25, 2018, 7:49pm

Extremely helpful, as always.

Many thanks @ptrblck.