How to implement moving average of parameters

JaniceXiong · April 23, 2022, 3:21am

Hi everyone! I want to implement the model in the paper “Sequence Level Contrastive Learning for Text Summarization”.

As the paper mentioned, the architecture of two f is the same, but only updates the parameters θ in fθ, and the parameters ξ in fξ is the moving average of θ.

I want to know:

whether or not parameters ξ should be calculated after every backward update the parameters θ
how to write codes to implement the moving average update of parameters ξ after I set require_grad=False for ξ

mxahan · April 23, 2022, 4:07am

You may find something from here.

Let me know if it works.