Evaluation loss on the training set higher than loss during training

    def forward(self, image, proj, proj_inv):
        return self.predict_2d_joint_locations(image, proj, proj_inv)

    def criterion(self, predicted, gt):
        return self.mse(predicted, gt)

    def training_step(self, batch, batch_idx):
        player_images, j2d, j3d, proj, proj_inv, is_synth = batch
        predicted_2d_joint_locations = self.predict_2d_joint_locations(player_images, proj, proj_inv)
        train_loss = self.criterion(predicted_2d_joint_locations, j2d)
        self.log('train_loss', train_loss)
        return train_loss

    def validation_step(self, batch, batch_idx):
        player_images, j2d, j3d, proj, proj_inv, is_synth = batch
        predicted_2d_joint_locations = self.predict_2d_joint_locations(player_images, proj, proj_inv)
        val_loss = self.criterion(predicted_2d_joint_locations, j2d)
        self.log('val_loss', val_loss)
        return val_loss

I have this simple code for training_step() and forward(). Both the functions essentially do the same.

Owing to a relatively small dataset, my model grossly overfits on the training data (as is evident from there being an orders of magnitude of difference between the training and validation losses). But that’s fine for now, I am perfectly aware of that and will add more data soon.

What surprises me is when I try to evaluate (infer). I don’t have a separate test set (for now) and only have a training and a validation set. When I evaluate on the validation set, the mean squared error turns out to be in the same range as the validation loss my model is based on as expected. However, when I evaluate on the training set, the mean squared error I get is again in the same range as the validation loss (not the training loss).

    if args.val:
            check_dl = dataset.val_dataloader()
        else:
            check_dl = dataset.train_dataloader()

        for player_images,j2d,j3d,proj,proj_inv,is_synth in check_dl:
            if args.visualize:
                # visualize dataset
                player_images = player_images.cpu().numpy()
                j2d_predicted = model(torch.from_numpy(player_images), proj, proj_inv).cpu().detach().numpy()
                print(((j2d - j2d_predicted) ** 2).mean(), model.training_step((torch.from_numpy(player_images),j2d,j3d,proj,proj_inv,is_synth), 0))

When I print print(((j2d - j2d_predicted) ** 2).mean() for images in the training set after fetching the model from the trained checkpoint, I get numbers in the range of the validation loss. I retried the same by printing the loss using the training_step() function, but I again receive high losses (in the validation loss range).

Note: The inference mean squared errors I receive on the training set are high but they are not as high as when the training actually started. So, the pre-trained model is being fetched properly. On a model with completely random weights, I should have received orders of magnitudes of higher errors. So, the model is definitely fetched correctly.

I have been scratching my head over this. Any help would be really appreciated.