In an Audio Classification problem, I am firstly loading a pretrained model, then running my own data through the model.
In audio problems, I am searching for optimum parameters (hop length, window size, etc) for transforming features into Mel Spectrograms. However, this changes the size of the original inputs. Thus, there is an error when attemping to load_state_dict of the model.
RuntimeError: Error(s) in loading state_dict for Cnn14Extractor:
size mismatch for spectrogram_extractor.stft.conv_real.weight: copying a param with shape torch.Size([513, 1, 1024]) from checkpoint, the shape in current model is torch.Size([552, 1, 1102]).
size mismatch for spectrogram_extractor.stft.conv_imag.weight: copying a param with shape torch.Size([513, 1, 1024]) from checkpoint, the shape in current model is torch.Size([552, 1, 1102]).
size mismatch for logmel_extractor.melW: copying a param with shape torch.Size([513, 64]) from checkpoint, the shape in current model is torch.Size([552, 64]).
All other discussions I have found talk about Image problems (size doesn’t change each time you change a parameter).
My main question is:
How can I change these 3 layers weights?
Can I make the model accept the new weights shape? What are my options besides:
Training from scratch (led to poorer results)
Resizing feature to fit the expected shape (loss of data?)
Changing the input shape shouldn’t change the parameter shapes and the inputs would also not be stored in the model.state_dict(). It seems you’ve manipulated the model parameters instead.
You would have to apply these manipulations to the original model before loading the state_dict.