How to implement moving average of parameters

Hi everyone! I want to implement the model in the paper “Sequence Level Contrastive Learning for Text Summarization”.

As the paper mentioned, the architecture of two f is the same, but only updates the parameters θ in , and the parameters ξ in is the moving average of θ.

I want to know:

  1. whether or not parameters ξ should be calculated after every backward update the parameters θ
  2. how to write codes to implement the moving average update of parameters ξ after I set require_grad=False for ξ

You may find something from here.

Let me know if it works.