Transfer Learning input image size

I’m trying to build a model for emotion detection using custom created model but didn’t get very good accuracy . Actually I wanted to use transfer learning in first thought but I got to know that the minimum input image size for almost all deep CNN is 224x224, the size of my dataset is 48x48 and I’ve tried to create many models in last week and I can’t find the best model with fine tuned parameters. I don’t know how to feed this image size to the already trained neural networks. The dataset is very large of 28,000 images of 7 class. Up sampling will blur the image so I need other idea to fulfill the requirement. Please anyone help me out here.
Thanks in advance.

Upsampling may blur the image but you might find out the model doesn’t care about blurriness ;). On the otherhand it can indeed be a waste of computation to upsample when the original images are very low resolution.

You might want to take a look at the ImageNet example: examples/ at master · pytorch/examples ( as a starting point. You do not need to modify the models for a different input resolution as most vision models have an average pooling layer which prevents the model shapes from being dependent on input size. The only thing you would need to change is the fully connected at the end (e.g., replace it with one that has a 7-class output rather than a 1000-class output).

Thank you for replying , as I mentioned earlier that every deep network uses pooling to reduce the size of images so there must be some minimum size of input image and every network has minimum of 224x224 as I know , I didn’t get what you are trying to explain with that git repo. can you explain how can I put 48x48 images as input in any of that pre trained networks.

I found this on Pytorch documentation as a warning :

The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types.

Maybe this is also applicable for Upsampling.

Yes, although you might consider simply modifying the model architecture to remove some pooling layers if you find that 48x48 breaks some input size constraint. I don’t think 224x224 is a minimum for many models as e.g., 112x112 (and below) should work fine on ResNet-50.

The warning you see is for a slightly different issue. It is saying that you have to be careful about mixing upsampling/downsampling methods (especially across different libraries or library versions) as they can impact model behavior. It isn’t a warning that upsampling or downsampling is problematic in general.

I have tried that , but didn’t get better accuracy on validation set. I tried to create deep CNN also (not residual) but doesn’t work well.
the dataset is here , can you say which architecture will work well! I’ve tried different.

How much finetuning are you doing? Can you comment on ResNet-18/50 performance (those are probably the most popular models for generic vision tasks).

class Block(nn.Module):
    def __init__(self,in_channels,out_channels,downsample = None , stride = 1):
        self.expansion = 4
        self.conv1  = nn.Conv2d(in_channels,out_channels,kernel_size=7,padding=3,stride=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2  = nn.Conv2d(out_channels,out_channels,kernel_size=5,padding=2,stride=stride)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3  = nn.Conv2d(out_channels,out_channels*self.expansion,kernel_size=1,padding=0,stride=1)
        self.bn3 = nn.BatchNorm2d(out_channels*self.expansion)
        self.elu = nn.ELU()
        self.identity_downsample = downsample

    def forward(self,x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.elu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.elu(out)
        out = self.conv3(out)
        out = self.bn3(out)
        # out = self.elu(out)
        # print(x.shape)
        if self.identity_downsample is not None:
            x = self.identity_downsample(x)
        # print(out.shape,x.shape)
        out = torch.add(out,x)
        out = self.elu(out)

        return out
class Resnet(nn.Module):

    def __init__(self, block, num_blocks, image_channels, num_classes):
        self.in_channels = 64
        self.conv1 = nn.Conv2d(image_channels,self.in_channels,kernel_size = 7, stride = 2, padding = 3)
        self.bn1 = nn.BatchNorm2d(self.in_channels)
        self.elu = nn.ELU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self.make_layer(
            Block, num_blocks[0], out_channels=self.in_channels, stride=1

        self.layer2 = self.make_layer(
            Block, num_blocks[1], out_channels=128, stride=2

        self.layer3 = self.make_layer(
            Block, num_blocks[2], out_channels=256, stride=2

        self.layer4 = self.make_layer(
            Block, num_blocks[3], out_channels=512, stride=2

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc1 = nn.Linear(512 * 4, 512)
        self.fc2 = nn.Linear(512,num_classes)

    def forward(self, x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.elu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.drop4(x)
        x = self.avgpool(x)
        x = x.reshape(x.shape[0], -1)
        x = self.fc1(x)
        x = self.fc2(x)

        return x

    def make_layer(self,Block,num_blocks,out_channels,stride):
        identity_downsample = None
        layers = []

        if stride != 1 or self.in_channels != out_channels * 4:
            identity_downsample = nn.Sequential(
                    out_channels * 4,
                nn.BatchNorm2d(out_channels * 4),

            Block(self.in_channels, out_channels, identity_downsample, stride)

        self.in_channels = out_channels * 4
        for i in range(num_blocks - 1):
            layers.append(Block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

I tried this.

Does the reference ResNet implementation in torchvision not work for some reason?

It didn’t give me good average accuracy.

What do your loss curves look like?

Actually the accuracy doesn’t increase above 35.But loss decreases almost every epoch. Is there any problem with the data?

To debug the training loop, you might want to check that your model can overfit a small training set (a few examples).

def train(model,dataloader,validloader,criterion,optimizer,epochs=50):
    max_valid_acc = 0
    train_acc,val_acc = 0,0
    for e in range(epochs):
        train_loss = 0.0
        model.train()     # Optional when not using Model Specific layer
        for data, labels in dataloader:
#             print(labels.shape[0])
            data, labels =,
#             print(data.shape)
            target = model(data)
#             print(target.shape)
            loss = criterion(target.float(),labels.long())
            train_loss = loss.item() * data.size(0)
            train_acc += torch.sum((torch.max(target, 1)[1] ==,0)
        valid_loss = 0.0
        model.eval()     # Optional when not using Model Specific layer
        for data, labels in validloader:
            data, labels =,

            target = model(data)
            loss = criterion(target.float(),labels.long())
            valid_loss = loss.item() * data.size(0)
            val_acc += torch.sum((torch.max(target, 1)[1] ==,0)
        print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(dataloader)} \t\t Validation Loss: {valid_loss / len(validloader)}')
        print("Validation Accuracy ... :",val_acc/(len(validloader)))
        print("Train Accuracy ... :",train_acc/(len(X_train_tensor)))
        if val_acc > max_valid_acc:
            print(f'Validation Acc Increased({max_valid_acc:.6f}--->{val_acc:.6f}) \t Saving The Model')
            max_valid_acc = val_acc
            # Saving State Dict
  , 'saved_model.pth')
        train_acc = 0
        val_acc = 0
    return model

How can I check that a few examples are overfitting? I’m beginner :slightly_smiling_face:

I’m getting this…

Epoch 18 Training Loss: 0.004407357424497604 Validation Loss: 8.263123961403024e-08
Validation Accuracy … : tensor(0.4572, device=‘cuda:0’)
Train Accuracy … : tensor(0.9526, device=‘cuda:0’)

I’m using optimizer = optim.SGD(net.parameters(), lr = 0.0001,momentum = 0.92) and

net.fc = nn.Sequential(nn.Linear(2048,512,bias = True),




in Resnet50. @eqy

For example, you can just load the first batch in the dataloader (then break from the loading loop) and verify that the loss goes down to basically zero after many epochs (you might want to tweak your learning rate schedule, if you have one for this experiment as the epochs have fewer examples now).

I have 28,000 training example so do I need to do for all the batches of let’s say 128 images manually train?

Yes, training for 128 images manually would work. Actually, you may want to just use the first epoch to get 128 (shuffled images) and then just reuse this first batch over and over.

But if I got the data which is overfitting how can I drop that from trainloader? But I have one question that if validation loss is decreasing then why the validation accuracy doesn’t increase and decrease for some epoch and behaves wrong.

The purpose of this test is not to improve the model but to simply verify that the basic training loop is working properly. If it works, then it makes sense to revert the changes and start looking at other potential issues. The issue of loss going down without accuracy increasing is possible; for example if the model makes the same incorrect predictions but with lower confidence between epochs this will cause the loss to decrease without an actual increase in performance.