Problem: merge two model parameters into other model with same network structure

zs963048949 · February 23, 2021, 6:05am

The code is flowing:

if epoch > self.epoch_merge:
    if epoch == self.epoch_merge:
        for i,param in enumerate(zip(self.decoder1.parameters(),self.decoder2.parameters(),self.decoder_merge.parameters())):
            param[2].data = param[0].data + param[1].data
    out_merge = self.decoder_merge(feature_merge_list)

My goal is to merge two all parameters of decoders into a new decoder(named decoder_merge) with same network structure,then the new decoder will be trained continuedly.However, When the epoch equals to epoch_merge,I got a very low accuracy than previous one. After this,the model accuracy was increased normally.
Therefore,I want to know what cause the cliff descent? Is it right for my merge measure?Thanks in advanced.

CedricLy · February 23, 2021, 7:46am

Maybe the optimizer use some momentum values, which are optimised for the original values?

zs963048949 · February 23, 2021, 8:07am

The optimizer is adam in my code.

zs963048949 · February 24, 2021, 6:02am

Do you know other methods about loading model parameters?Thanks a lot.

ptrblck · February 25, 2021, 6:46am

Was the accuracy of the two separate decoders high, while the new “merged” decoder yields a low accuracy?
If so, I think this might be expected, if the parameters are not “close” to each other (whatever this means in the high dimensional space). There are methods such as stochastic weight average, which might help, but I don’t think that calculating the “mean” model of two trained models yields generally a better accuracy.

zs963048949 · February 25, 2021, 9:09am

Thank you，you are right about what you said.I just want to ensure the correctness for the code.