Hi there,
I am using transfer learning approach for my audio data classification. I have converted the audio into spectrograms(2 D structures).
My queries are
Do I need to run the previous model for training, or i should directly load this pretrained model ? coz I think there is no gain running on old model. the shape of spectrogram input is (1,384,118) I think I need to reshape it , any suggestion?
I am using alexnet as follows:
model=model.alexnet(pretrained=True)
model.classifier[6]=nn.Linear(4096,4) # as I have four classes
criterion
optimizer
training.
The error I am getting is
Given groups=1,weight of size [64,3,11,11],expected input[64,1,384,118] to have 3 chanels, but got 1 channels instead.
Could someone help ,I am trying transfer learning first time.
I am getting an intuition that I should change this 2 d tensor into 3d tensor. For example Can I concatenate the spectrogram matrix with other two similar shape matrix and finally make a 3d image like structure to input this image.
How to reshape it
A=(128,118)
B=(128,118)
C=(128,118)
now I want to make an image out of it ,so that the number of channels become 3.something like
Out=(3,128,118)
I dnt know whether its right way or not?
Take the tensor of shape (C=1, H, W) into (C=3, H, W) by concating the tensor in the channel dimensions 3 times.
Use a conv2d before passing to the model with input channels 1 and output channels 3 with kernel size 1 so that one gets a 3 channel input which the model expects.
Thanks for reply @user_123454321 could you plz elaborate in codes.
Theoretically I find this find second suggestion is the solution. But I am not able to implement it. Where I need to put this conv2d layer?
# img is a numpy array of shape (H,W)
img = img[..., None] # add extra channel dimension in the end to get shape (H, W, 1)
img = np.concatenate([img, img, img], 2) # concatenate 3 times in 3rd dimesnion to get shape (H, W, 3)
img = torch.from_numpy(img.transpose([2, 0, 1])) # transpose to get shape (3, H, W)
for this one can update model slightly like this
class Model(nn.Module):
def __init__(self):
self.conv = nn.Conv2d(1, 3, (1, 1)) # create conv layer with 1 input channel and 3 output channel
# with kernel size (1, 1), This would lead to the same solution as earlier if the bias is 0 and the
# kernel weights (of shape (3, 1, 1, 1)) are ones, but we give the model here more flexibility in
# coming up with the weights.
self.base_model = torchvision.models.alexnet(pretrained=True)
self.base_model.classifier[6] = nn.LInear(4096, 4)
def forward(self, x):
return self.base_model(self.conv(x))