I think the cleanest approach would be to create a mapping between the layer names of modelA
and modelB
and load each state_dict
of these layers separately as described here.
This approach would need an explicit definition of which layers should be loaded and would not rely on e.g. using the strict=False
argument, which might yield unexpected results if the output is ignored.