Transfer learning o pretrained alexnet model

krishna511 · January 17, 2021, 8:04am

Hi there,
I am using transfer learning approach for my audio data classification. I have converted the audio into spectrograms(2 D structures).
My queries are

Do I need to run the previous model for training, or i should directly load this pretrained model ? coz I think there is no gain running on old model. the shape of spectrogram input is (1,384,118) I think I need to reshape it , any suggestion?
I am using alexnet as follows:

model=model.alexnet(pretrained=True)
model.classifier[6]=nn.Linear(4096,4) # as I have four classes 
criterion
optimizer
training.

The error I am getting is
Given groups=1,weight of size [64,3,11,11],expected input[64,1,384,118] to have 3 chanels, but got 1 channels instead.
Could someone help ,I am trying transfer learning first time.

krishna511 · January 17, 2021, 9:25am

I am getting an intuition that I should change this 2 d tensor into 3d tensor. For example Can I concatenate the spectrogram matrix with other two similar shape matrix and finally make a 3d image like structure to input this image.
How to reshape it
A=(128,118)
B=(128,118)
C=(128,118)

now I want to make an image out of it ,so that the number of channels become 3.something like
Out=(3,128,118)
I dnt know whether its right way or not?

user_123454321 · January 17, 2021, 12:02pm

Usually people use 2 ways to get around this

Take the tensor of shape (C=1, H, W) into (C=3, H, W) by concating the tensor in the channel dimensions 3 times.
Use a conv2d before passing to the model with input channels 1 and output channels 3 with kernel size 1 so that one gets a 3 channel input which the model expects.

krishna511 · January 17, 2021, 1:05pm

Thanks for reply @user_123454321 could you plz elaborate in codes.
Theoretically I find this find second suggestion is the solution. But I am not able to implement it. Where I need to put this conv2d layer?

user_123454321 · January 17, 2021, 3:32pm

# img is a numpy array of shape (H,W)
img = img[..., None] # add extra channel dimension in the end to get shape (H, W, 1)
img = np.concatenate([img, img, img], 2) # concatenate 3 times in 3rd dimesnion to get shape (H, W, 3)
img = torch.from_numpy(img.transpose([2, 0, 1])) # transpose to get shape (3, H, W)

for this one can update model slightly like this

class Model(nn.Module):
   def __init__(self):
        self.conv = nn.Conv2d(1, 3, (1, 1)) # create conv layer with 1 input channel and 3 output channel
        # with kernel size (1, 1), This would lead to the same solution as earlier if the bias is 0 and the 
        # kernel weights (of shape (3, 1, 1, 1)) are ones, but we give the model here more flexibility in
        # coming up with the weights.
        self.base_model = torchvision.models.alexnet(pretrained=True)
        self.base_model.classifier[6] = nn.LInear(4096, 4)
  def forward(self, x):
       return self.base_model(self.conv(x))

krishna511 · January 17, 2021, 5:24pm

Thank You so Much sir, I will try and get back to you
Regards