Transfer Learning input image size

Thank you for the explanation sir. I have got to know that the original accuracy on this dataset (FER2013) is around 72 by the author of the paper so probably there must be some problem in dataset! I think.

Take in account that Resnest is most likely pre-trained in ImageNet, which is a thousand class, RGB dataset, meanwhile Emotion Detection seems to have a have 7 classes and images in a scale of grey, so you are missing several of the key representations that the Backbone was trained for. I think you wouldn’t find much of a difference if you train from scratch.
Whether you use a pre-trained dataset or not, I’d recommend optimizing other Hiperparameters. Try to increase Batch-size, Resolution or try different Optimizers (SGD rather than Adam if you see too much overfitting).