Loading weights into changing model

whoab · April 29, 2019, 4:24am

I am pickling my entire model in a PKL file. Then, I changed the model code by adding another trainable weight.

Original init:

def init(self,
num_heads,
embedding_dim,
attention_type=“additive”):
self.save_init_params(locals())
super().init()
self.debug_attn = False
self.attention_type = attention_type
assert attention_type in [“dot_product”, “additive”], attention_type

    # Create heads for query
    self.fc0 = nn.Linear(embedding_dim, num_heads * embedding_dim)
    # Project integration into logits
    self.fc1 = nn.Linear(embedding_dim, 1)

New init:
def init(self,
num_heads,
embedding_dim,
attention_type=“additive”):
self.save_init_params(locals())
super().init()
self.debug_attn = False
self.attention_type = attention_type
assert attention_type in [“dot_product”, “additive”], attention_type

    # Create heads for query
    self.fc0 = nn.Linear(embedding_dim, num_heads * embedding_dim)
    # Project integration into logits
    self.fc1 = nn.Linear(embedding_dim, 1)
    self.new_weight = Parameter(torch.Tensor([400]))

I only added this one line, and it doesn’t affect the other weights at all. However, trying to load the pickle now, I get:
size mismatch for fc0.weight: copying a param with shape torch.Size([64, 28]) from checkpoint, the shape in current model is torch.Size([64, 32]).

fc0 is not related to the self.new_weight at all

alex.veuthey · April 29, 2019, 6:31am

Are you sure the variable embedding_dim is the same across both executions? From the code you’ve given, it indicates that embedding_dim = 28 in the first execution and embedding_dim = 32 in the second.

I don’t know how you load the weights in the model after initialization, but you could also try to call it this way: model.load_state_dict(checkpoint, strict=False) to ensure that non-matching weights do not get copied (and do not throw errors like this one), however it has the side-effect of failing silently (e.g. if you actually have a mismatch, the layer will not load weights and will be initialized by default, which is not ideal in fine-tuning or inference). I think it only tries to load weights with perfectly matching names (e.g. fc0.weight) so it will probably not be useful in your case.

You can also try to comment the line self.fc0 = nn.Linear(...) to see if the next one (with fc1) crashes due to embedding_dim being different than the loaded model.