I’d like to ask advise on a functionality I would expect from
The current implementation will fail on a size mismatch or key mismatch. Specifying
strict=False will ignore and skip such cases.
My desired functionality is to address the size mismatch in a way that allows the user to benefit from pre-trained tensors, despite a structural change in the network.
- in case the new tensor is larger, randomly initialize (or zero init) the residual.
- in case the new tensor is smaller, retain the overlapping shape and throw away the rest.
Is it possible to do this?
I often find myself with this need: I pre-train a smaller and less deep model - plateau in terms of loss - subsequently I’d like to carry this experience towards the next model generation: a broader and/or deeper model - saving me some training time.