I am currently fine tuning a pretrained DenseNet model from torchvision to action recognition task.
In training, model outputs values in the range [-10,10], however, when I pass to the validation phase, the outputs from the last layer of the model explode to a range [-10000, 10000]. This is just the input passed to network and the outputs are from the last fully connected layer.
I use the same loop for training and validation.
What may be the cause of this ?
Is the training data from the same distribution as the validation data and are you using the same preprocessing?
Do you have batchnorm layers and how large are your training batches?
I am using the Mini-Kinetics-200 dataset, so, the data are coming from the same distribution. I do the same rescaling on all frames of all of the videos, that is, rescaling them to 224 x 224. Batchnorm layers are present in the DenseNet-121 model from torchvision. I use batch size of 1, to simulate online training.
Are you using a batch size of 1 during training or just validation?
In the former case, I would recommend to remove the batch norm layers, as single samples will most likely create invalid running estimates, which would explain the bad validation accuracy.
You could try to use e.g.
nn.InstanceNorm layers instead.
Thanks for the reply. I am using BN for both training and validation and will change them.