Transfer Learning Feature Extraction confusion

Hello. So I have been doing transfer learning and followed a youtube tutorial but now when I look around to other documents and examples there is this thing very confusing and budging me. I used EfficientNet v1 B0 model and the default model when loaded as
model = timm.create_model(CFG.model_name,pretrained=True) and when i print model the last few layers are as: (conv_head): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNormAct2d(
1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
(drop): Identity()
(act): SiLU(inplace=True)
)
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Flatten(start_dim=1, end_dim=-1))
(classifier): Linear(in_features=1280, out_features=1000, bias=True)
)

Now I have 6 class problem and I want to change last layer features =1000 to 6.
I did this by:

print(model.conv_stem)
model.conv_stem = nn.Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
model.state_dict()[‘conv_stem.weight’] = model.state_dict()[‘conv_stem.weight’].sum(dim=1, keepdim=True)
print(model.conv_stem)
#let’s update the pretrained model:
for param in model.parameters():
param.requires_grad=False

#orginally, it was:
#(classifier): Linear(in_features=1280, out_features=1000, bias=True)
model.classifier = nn.Linear(1280, 6)

But this is what I learned but the youtube tutorial says:
model.classifier = nn.Sequential(
nn.Linear(in_features=1792, out_features=625), #1792 is the orginal in_features
nn.ReLU(), #ReLu to be the activation function
nn.Dropout(p=0.3),
nn.Linear(in_features=625, out_features=256),
nn.ReLU(),
nn.Linear(in_features=256, out_features=6)
)
I wonder which is correct and what is feature extractor and what significance sequential thing has? And how one can calculate or decide out_features=625???

Which method is correct whats the pros and cons?

Both correct.
Here’s my opinion,

Single nn.Linear has less parameters compared to multiple nn.Linear.
If the classification task is bit more complex(difficult), may consider multiple one.

Additionally, if the output dimension of feature extractor is quite bigger than the number of classes,
often use multiple.

1 Like

alright so I’m doing ist method not the multiples one and got these plots for 10 epochs:


my dataset has imbalance but is it due to only that reason or some other???

(i had train nd test folders in my dataset so i used splitfolders library to split train into 90/10 train validation set. )

Val loss seems to fluctuate hard but the accuracy is not that weird.
Decrease the learning rate and see how it changes for more epoch.

so i actually trained efficientnet b1 from version 1 which results in the above graphs but for b0 i got


i only ran 10 epochs for both.

Looks better. The accuracy is even high than the previous exp.
Try more:)

I think you didnt get my point. The plots I posted at first were of enet b0 size64x64 and the plots i just posted are of enetb1 size64x64. Both are trained on 10 epochs. I have 6 classes out f which 2 are imbalanced besides my training data is 89k, val is 9k nd test is 24k. My confusion is am i doing it right? like i will deal with imbalance later on but right now are my both plots fine? if not then why? as you said val loss in first posted plots is fluctuating why it happens? btw learning rate is 0.001 batch size 16 nd optimizer is adam.

I recommend you to read

BTW, I understood your message

so i actually trained efficientnet b1 from version 1 which results in the above graphs but for b0 i got

Isn’t it mean that the second post is b0?

Yes 2nd post is B0.
Alright I read both of them. But reading learning curve is still confusing. And for imbalance Im thinking to just oversample 2 classes which have too much imbalance.
But I read about weightedSampler. Was reading its documentation and ptrblck mentioned some sources and code snippets but I wonder how Im gonna apply that like i dont feel like I understood how to implement in pytorch like im newbie using pytorch although i understood the intuition.

my class imbalance is this way:

I wonder can i just copy a class folder say 4 and paste images in that folder or duplicate the folder images? like it will get balanced and equally i mean each image will have 1 similar image?