Hello,
I am working on multiclass image classification where, I have a custom dataset with 13 classes such as Alien, Predator, Terminator, Robin, Batman, Superman, Spiderman, Valkyrie, Raven, BeastBoy, DeathStroke, Deadpool, PoisonIvy. I have around 5236 images for training and 1300 for validation. Each class has around approx 400 for training and 100 for validation. I went through Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 2.0.0+cu117 documentation and Pytorch’s fine tuning tutorial. I am using pretrained ConvNeXt model and I have unfreeze layer 6,7 of feature extractor and classifier layer 2
(
7): Sequential(
(0): CNBlock(
(block): Sequential(
(0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
(1): Permute()
(2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(3): Linear(in_features=768, out_features=3072, bias=True)
(4): GELU(approximate='none')
(5): Linear(in_features=3072, out_features=768, bias=True)
(6): Permute()
)
(stochastic_depth): StochasticDepth(p=0.37714285714285717, mode=row)
)
(1): CNBlock(
(block): Sequential(
(0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
(1): Permute()
(2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(3): Linear(in_features=768, out_features=3072, bias=True)
(4): GELU(approximate='none')
(5): Linear(in_features=3072, out_features=768, bias=True)
(6): Permute()
)
(stochastic_depth): StochasticDepth(p=0.3885714285714286, mode=row)
)
(2): CNBlock(
(block): Sequential(
(0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
(1): Permute()
(2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(3): Linear(in_features=768, out_features=3072, bias=True)
(4): GELU(approximate='none')
(5): Linear(in_features=3072, out_features=768, bias=True)
(6): Permute()
)
(stochastic_depth): StochasticDepth(p=0.4, mode=row)
)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=1)
(classifier): Sequential(
(0): LayerNorm2d((768,), eps=1e-06, elementwise_affine=True)
(1): Flatten(start_dim=1, end_dim=-1)
(2): Linear(in_features=768, out_features=13, bias=True)
)
)
Here, is the model summary
└─Sequential (6) [12, 384, 14, 14] [12, 768, 7, 7] -- True
│ │ └─LayerNorm2d (0) [12, 384, 14, 14] [12, 384, 14, 14] 768 True
│ │ └─Conv2d (1) [12, 384, 14, 14] [12, 768, 7, 7] 1,180,416 True
│ └─Sequential (7) [12, 768, 7, 7] [12, 768, 7, 7] -- True
│ │ └─CNBlock (0) [12, 768, 7, 7] [12, 768, 7, 7] 4,763,136 True
│ │ └─CNBlock (1) [12, 768, 7, 7] [12, 768, 7, 7] 4,763,136 True
│ │ └─CNBlock (2) [12, 768, 7, 7] [12, 768, 7, 7] 4,763,136 True
├─AdaptiveAvgPool2d (avgpool) [12, 768, 7, 7] [12, 768, 1, 1] -- --
├─Sequential (classifier) [12, 768, 1, 1] [12, 13] -- True
│ └─LayerNorm2d (0) [12, 768, 1, 1] [12, 768, 1, 1] 1,536 True
│ └─Flatten (1) [12, 768, 1, 1] [12, 768] -- --
│ └─Linear (2) [12, 768] [12, 13] 9,997 True
=======================================================================================================================================
Total params: 49,464,685
Trainable params: 15,482,125
Non-trainable params: 33,982,560
Total mult-adds (G): 4.93
=======================================================================================================================================
Input size (MB): 7.23
Forward/backward pass size (MB): 2485.59
Params size (MB): 197.80
Estimated Total Size (MB): 2690.62
Why unfreeze CNN feature extractor layers?
Because, the custom dataset is completely new to the pretrained model(ConvNeXt). ConvNeXt has never seen such data. So, from my understanding its better to unfreeze last 2 layers of feature extractor to get some essential learnings specific to my custom data.
I have couple of questions
- Which parameters should I pass into the optimizer? should it be whole model parameters or parameters of the layers which I have unfrozen(layer[6,7] from feature extractor and feature classifier layers)
Approach 1 update weights of only unfrozen(layer[6,7] from feature extractor and feature classifier layers) while rest of the model weights are frozen
from torchvision import models
model = models.convnext_small(pretrained=True)
params_to_update = []
for param in model.classifier.parameters():
param.requires_grad = True
params_to_update.append(param)
for name, block in model.features.named_modules():
if(name in finetune_features_layers):
for param in block.parameters():
param.requires_grad = True
params_to_update.append(param)
# optimizer
optimizer = optim.Adam(
params_to_update,
lr = 0.0001
)
Approach 2 Update whole model weights. Observe below parameters of all layers will be optimized
from torchvision import models
model = models.convnext_small(pretrained=True)
params_to_update = model.parameters()
# optimizer
optimizer = optim.Adam(
params_to_update,
lr = 0.0001
)
- Does the size and type of the dataset matters in generalization which could essentially vary the number of CNN layers to keep frozen or unfroze couple of them?
- I trained my model for
Epoch: 49
Train Loss: 1.156555 Acc: 0.6353
Elapsed 12323.04s, 246.46 s/epoch, 3.01 s/batch, ets 0.00s
Test set: Average loss: 0.8826, Accuracy: 930/1300 (72%)
Model Improved. Saving the Model...
But when I am evaluating this newly trained model on test data(completely new/fresh unseen custom data with the same 13 classes as above). I get around 10% of accuracy. I am trying to understand what is going wrong here? I checked the dataset both training and validation they have right class labels and right images.
Is my understanding of feature extraction and fine tuning correct? Am I heading in the right direction?