- You could just add the parameters lists:
optimizer = optim.SGD(list(modelA.parameters()) + list(modelB.parameters()), lr=1e-3)
How are you transferring the parameters from layer A2 to B1? If so, the weight matrix will have a size mismatch ([30, 8]
vs. [40, 8]
).