Audio Classification: How to change input weights when working with a pretrained model?

shaneR · November 18, 2020, 6:05am

In an Audio Classification problem, I am firstly loading a pretrained model, then running my own data through the model.
In audio problems, I am searching for optimum parameters (hop length, window size, etc) for transforming features into Mel Spectrograms. However, this changes the size of the original inputs. Thus, there is an error when attemping to load_state_dict of the model.

RuntimeError: Error(s) in loading state_dict for Cnn14Extractor:
        size mismatch for spectrogram_extractor.stft.conv_real.weight: copying a param with shape torch.Size([513, 1, 1024]) from checkpoint, the shape in current model is torch.Size([552, 1, 1102]).
        size mismatch for spectrogram_extractor.stft.conv_imag.weight: copying a param with shape torch.Size([513, 1, 1024]) from checkpoint, the shape in current model is torch.Size([552, 1, 1102]).
        size mismatch for logmel_extractor.melW: copying a param with shape torch.Size([513, 64]) from checkpoint, the shape in current model is torch.Size([552, 64]).

All other discussions I have found talk about Image problems (size doesn’t change each time you change a parameter).

My main question is:
How can I change these 3 layers weights?

Can I make the model accept the new weights shape? What are my options besides:

Training from scratch (led to poorer results)
Resizing feature to fit the expected shape (loss of data?)

ptrblck · November 22, 2020, 8:20am

Changing the input shape shouldn’t change the parameter shapes and the inputs would also not be stored in the model.state_dict(). It seems you’ve manipulated the model parameters instead.
You would have to apply these manipulations to the original model before loading the state_dict.

shaneR · November 25, 2020, 12:32am

Is it possible to expand on this a bit? It seems the problem is:

I loaded a pretrained model & weights with set parameters
I then created a new model, and attempted to load weights into it, but the parameters don’t match
So the answer would seem to be to change parameters of the state_dict()?

Per the following code example, I attempted to overwrite the first layers of the state_dict() from scratch. However, the problem persisted.

backbone = FeatureExtractor(sample_rate, window_size, hop_size, mel_bins)

if pretrained:
        state_dict = load_state_dict_from_url(model_urls['cnn_url'])

conv1 = nn.Conv1d(1, 552, kernel_size=(1102,) ~)
weight = torch.rand(552,1,1102)
state_dict["model"]["spectrogram_extractor.stft.conv_real.weight"] = weight

ptrblck · November 25, 2020, 8:11am

The approach should work and you can manipulate the parameters in the state_dict so that they match the new architecture. Here is a small example:

modelA = models.resnet18()
modelB = models.resnet18()

# Change modelB
modelB.fc = nn.Linear(512, 10)

# Try to load the pretrained state_dict
state_dict = modelA.state_dict()
modelB.load_state_dict(state_dict) # error

# Manipulate state_dict to match parameter shape
state_dict['fc.weight'] = state_dict['fc.weight'][:10]
state_dict['fc.bias'] = state_dict['fc.bias'][:10]

# Load again
modelB.load_state_dict(state_dict) # works