Compute validation loss for Faster RCNN

loicdtx · November 27, 2019, 5:17pm

Hi, I’m doing object detection on a custom dataset using transfer learning from a pretrained Faster RCNN model.
I would like to compute validation loss at the end of each epoch. How can this be done?

If I run the code below (model in training mode) I get losses, but dropout isn’t deactivated, so I am wondering how ‘valid’ are these loss values. And running the model in eval mode only returns the predictions.

model.train()
for images, targets in data_loader_val:
    images = [image.to(device) for image in images]
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

    with torch.no_grad():
        val_loss_dict = model(images, targets)
        print(val_loss_dict)

MLaurenceFournier · December 10, 2019, 12:04am

I’m wondering the same thing. Did you find a solution? I was thinking of forcing the training mode on only some submodules (the ones that output losses).

loicdtx · December 10, 2019, 1:54pm

I thought it through and came to the conclusion that validation loss is only meaningful when considered relatively to training loss. Training loss is computed with dropouts too, so they are comparable.

MLaurenceFournier · December 10, 2019, 4:44pm

I guess for dropout I might be ok, but in general wouldn’t that screw modules like batch norm that keep running estimates?

mapostig · July 17, 2020, 2:27pm

Hello,

Did you reach any conclusion? I am working also at object detection in my custom dataset and I would like to check validation and training losses evolution, but I’m not sure if it is a good practice to use the .train() mode during evaluation.

Usama_Hasan · July 17, 2020, 2:57pm

@mapostig No, I guess it’s not a good practice to use model.train() mode in evaluation. You can use the same custom dataset class to create a different dataset loader for your evaluation dataset.

for phase in ['train','val']:
    if phase == 'train':
        model.train()
        #Training Part with backprog
    else:
        model.eval()
        #Just a forward Pass.

Some layers like Dropout and batch Norm will behaves differently under model.eval().
Further you can look to this discussion.

Arun_Mohan · September 16, 2020, 2:28pm

Even Iam stuck at the same place. Is it possible to calculate validation loss properly?

Arun_Mohan · September 17, 2020, 1:52pm

@loicdtx did u got any solution for this problem

loicdtx · September 18, 2020, 12:50pm

@Arun_Mohan, validation loss is just there to control for overfit during training; it has no analytical value. It’s therefore completely fine to compute it like I did in the original post (model in train mode and gradient deactivated).

Arun_Mohan · September 21, 2020, 5:42am

@loicdtx thanks…Even I tried the same way. I think it is not an issue.

mkisantal · December 17, 2020, 7:45pm

While I agree with those above arguing that train mode validation loss calculation is fine, there still a serious efficiency problem here.

If you also want the model outputs (for tracking IOU, accuracy, etc., which is often the case), then you need to run inference twice. Training mode for the loss, and eval mode for the outputs. Would be better to have both with a single pass!

mkisantal · December 17, 2020, 8:45pm

I posted a hacky solution on StackOverflow to get both outputs and losses in a single pass: https://stackoverflow.com/questions/60339336/validation-loss-for-pytorch-faster-rcnn/65347721#65347721

coolcucumber94 · March 7, 2021, 11:19pm

Hi @Usama_Hasan,

Thanks for your answer. I am using the pretrained Faster RCNN model, and I see that the BatchNormalization Layers are Frozen:

FasterRCNN(
  (transform): GeneralizedRCNNTransform()
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d()
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d()
          )
        )

Then it should not affect the running stats right?

Usama_Hasan · March 8, 2021, 9:50am

Hey @coolcucumber94 , You’re right it won’t affect the running stats.
Plus you can add your code inside code section or just write them between triple comas, it helps in debugging and understanding code.

coolcucumber94 · March 8, 2021, 5:39pm

Thanks for the reply.

SU801T · March 17, 2022, 10:35pm

Was this found to be appropriate? I’ve been told that batch norm layers and dropout needs to be in eval() mode, however, I’m only interested in calculating validation loss to save the “best” model