In an Audio Classification problem, I am firstly loading a pretrained model, then running my own data through the model.
In audio problems, I am searching for optimum parameters (hop length, window size, etc) for transforming features into Mel Spectrograms. However, this changes the size of the original inputs. Thus, there is an error when attemping to load_state_dict of the model.
RuntimeError: Error(s) in loading state_dict for Cnn14Extractor:
size mismatch for spectrogram_extractor.stft.conv_real.weight: copying a param with shape torch.Size([513, 1, 1024]) from checkpoint, the shape in current model is torch.Size([552, 1, 1102]).
size mismatch for spectrogram_extractor.stft.conv_imag.weight: copying a param with shape torch.Size([513, 1, 1024]) from checkpoint, the shape in current model is torch.Size([552, 1, 1102]).
size mismatch for logmel_extractor.melW: copying a param with shape torch.Size([513, 64]) from checkpoint, the shape in current model is torch.Size([552, 64]).
All other discussions I have found talk about Image problems (size doesn’t change each time you change a parameter).
My main question is:
How can I change these 3 layers weights?
Can I make the model accept the new weights shape? What are my options besides:
- Training from scratch (led to poorer results)
- Resizing feature to fit the expected shape (loss of data?)
Changing the input shape shouldn’t change the parameter shapes and the inputs would also not be stored in the
model.state_dict(). It seems you’ve manipulated the model parameters instead.
You would have to apply these manipulations to the original model before loading the
Is it possible to expand on this a bit? It seems the problem is:
- I loaded a pretrained model & weights with set parameters
- I then created a new model, and attempted to load weights into it, but the parameters don’t match
- So the answer would seem to be to change parameters of the
Per the following code example, I attempted to overwrite the first layers of the
state_dict() from scratch. However, the problem persisted.
backbone = FeatureExtractor(sample_rate, window_size, hop_size, mel_bins)
state_dict = load_state_dict_from_url(model_urls['cnn_url'])
conv1 = nn.Conv1d(1, 552, kernel_size=(1102,) ~)
weight = torch.rand(552,1,1102)
state_dict["model"]["spectrogram_extractor.stft.conv_real.weight"] = weight
The approach should work and you can manipulate the parameters in the
state_dict so that they match the new architecture. Here is a small example:
modelA = models.resnet18()
modelB = models.resnet18()
# Change modelB
modelB.fc = nn.Linear(512, 10)
# Try to load the pretrained state_dict
state_dict = modelA.state_dict()
modelB.load_state_dict(state_dict) # error
# Manipulate state_dict to match parameter shape
state_dict['fc.weight'] = state_dict['fc.weight'][:10]
state_dict['fc.bias'] = state_dict['fc.bias'][:10]
# Load again
modelB.load_state_dict(state_dict) # works