# Measuring Accuracy/Loss when fine tuning fcn_resnet101 segmentation model

I want to fine tune the fcn_resnet101 segmentation model and I am following this beginner’s tutorial and this intermediate tutorial, that has some parts more pertinent to segmentation.

In the beginner’s tutorial, the problem its trying to solve is to classify images as either bees or ants, so the way accuracy is measured is by seeing how many of the predicted labels match the actual labels.

In segmentation I imagine accuracy is measured at the pixel level (i.e for each pixel in the image see if the predicted class matches the actual class in the annotated image). So my question is how can adapt the code below, that’s taken from from the classification example to measure the accuracy and loss for segmentation?

Are there any built in functions that can help track accuracy at the pixel level? Or if not, some sample code to do this so I can adapt it to the code below.

``````def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
since = time.time()

val_acc_history = []

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)

# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train()  # Set model to training mode
else:
model.eval()   # Set model to evaluate mode

running_loss = 0.0
running_corrects = 0

# Iterate over data.
inputs = inputs.to(device)
labels = labels.to(device)

# forward
# track history if only in train
# Get model outputs and calculate loss
# Special case for inception because in training it has an auxiliary output. In train
#   mode we calculate the loss by summing the final output and the auxiliary output
#   but in testing we only consider the final output.
if is_inception and phase == 'train':
# From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
outputs, aux_outputs = model(inputs)
loss1 = criterion(outputs, labels)
loss2 = criterion(aux_outputs, labels)
loss = loss1 + 0.4*loss2
else:
outputs = model(inputs)
loss = criterion(outputs, labels)

_, preds = torch.max(outputs, 1)

# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()

# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)

print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
if phase == 'val':
val_acc_history.append(epoch_acc)

print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

return model, val_acc_history
``````

In the intermediate tutorial it seems that the accuracy is measured when calling the ‘train_one_epoch’ function in the lines below, but I don’t know if that function is just relevant for the mask_rcnn model, and can be used in the fcn_segmentation model as well.

``````    for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
``````

The code should work for a segmentation model output of `[N, nb_classes, H, W]` and the corresponding target of `[N, H, W]` as this code snippet shows:

``````N, H, W = 2, 224, 224
nb_classes = 10
output = torch.randn(N, nb_classes, H, W)
preds = torch.argmax(output, 1)

labels = torch.randint(0, nb_classes, (N, H, W))

print(torch.sum(preds == labels).float() / labels.nelement())
> tensor(0.989)
``````

Thanks!

N here is the batch-size right?

Yes, I used `N` to specify the batch size.

Thanks!

I have another question to make sure I understand this correctly.

In the` #statistics` section I currently have this which is giving me an accuracy value of 2.3 which doesn’t make sense.

``````            total_train += labels.nelement()

correct_train += preds.eq(labels.data).sum().item()

train_accuracy = 100 * correct_train / total_train
``````

so then, should I replace the three lines above with the line below?:

`train_accuracy = torch.sum(preds == labels).float() / labels.nelement()`

The full code of the train_model function is below, with the old accuracy calculation commented in the statistics section. Would this be the correct way then, or am I still missing something?

``````def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):

since = time.time()

val_acc_history = []

best_model_wts = copy.deepcopy(model.state_dict())

best_acc = 0.0

for epoch in range(num_epochs):

print('Epoch {}/{}'.format(epoch, num_epochs - 1))

print('-' * 10)

# Each epoch has a training and validation phase

for phase in ['train', 'val']:

if phase == 'train':

model.train()  # Set model to training mode

else:

model.eval()   # Set model to evaluate mode

running_loss = 0

total_train = 0

correct_train = 0

# Iterate over data.

inputs = inputs.to(device) #OriginalImage

# forward

# track history if only in train

# Get model outputs and calculate loss

# Special case for inception because in training it has an auxiliary output. In train

#   mode we calculate the loss by summing the final output and the auxiliary output

#   but in testing we only consider the final output.

if is_inception and phase == 'train':

# From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958

outputs, aux_outputs = model(inputs)

loss1 = criterion(outputs, labels)

loss2 = criterion(aux_outputs, labels)

loss = loss1 + 0.4*loss2

else:

outputs = model(inputs)['out']

labels = labels.long()

outputs = outputs.squeeze(1)

loss = criterion(outputs, labels)

_, preds = torch.max(outputs, 1)

# backward + optimize only if in training phase

if phase == 'train':

loss.backward()

optimizer.step()

# statistics

running_loss += loss.item() #* inputs.size(0)

##total_train += labels.nelement()

##correct_train += preds.eq(labels.data).sum().item()

##train_accuracy = 100 * correct_train / total_train
train_accuracy =  torch.sum(preds == labels).float() / labels.nelement()

print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

# deep copy the model

if phase == 'val' and epoch_acc > best_acc:

best_acc = epoch_acc

best_model_wts = copy.deepcopy(model.state_dict())

if phase == 'val':

val_acc_history.append(epoch_acc)

print()

time_elapsed = time.time() - since

print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

print('Best val Acc: {:4f}'.format(best_acc))

return model, val_acc_history
``````

The result is a bit weird. Could you run some sanity checks and see, what numbers `total_train` and `correct_train` have after a couple of iterations?
The second approach would only calculate the accuracy of the last batch, if I’m not mistaken.

After doing some debugging I think I know what I did wrong. Like you said I was only getting the accuracy of the past batch, lets say 97 and it was diving that number of the items in my validation data set (in my case around 40) so that’s why I was getting 2.33 as a number for accuracy. So now what I’m going to do is sum all the accuracies and divide that sum by the length of my dataset, and that should give me the correct accuracy for the epoch.

But I do have one question, how can I get the number `0.8875` within the tensor `tensor(0.8875, device='cuda:0')`

You could call `tensor.item()` to get the Python scalar value.

I have a question.
is my code correct for accuracy of one label?

``````N, H, W = 2, 224, 224
nb_classes = 10
output = torch.randn(N, nb_classes, H, W)
preds = torch.argmax(output, 1)
labels = torch.randint(0, nb_classes, (N, H, W))
labels1 = labels.eq(5).float()
preds1 = preds.eq(5).float()
print(torch.sum(preds1 == labels1).float() / labels1.nelement())
>  tensor(0.8196)
``````

Your code snippet will calculate the accuracy for an imbalanced binary classification use case for class5 only.
Note that by selecting class5, you will get a lot of “non-class5” outputs, which will create a high True Negative score.
E.g. if your current segmentation output contains only a single pixel with class5, which is also wrongly classified, you’ll still get a very high accuracy as shown in this example:

``````N, H, W = 2, 20, 20
nb_classes = 6
output = torch.randn(N, nb_classes, H, W)
# Predict only a single pixel as class5
output[:, 5] = -1000000
output[0, 5, 0, 0] = 100000

preds = torch.argmax(output, 1)

# Create only single target as class5
labels = torch.randint(0, nb_classes-1, (N, H, W))
labels[0, 1, 1] = 5

labels1 = labels.eq(5).float()
preds1 = preds.eq(5).float()
print(torch.sum(preds1 == labels1).float() / labels1.nelement())
> tensor(0.9975)
``````
1 Like

@ptrblck Thanks a lot.