Hi everyone! I want to implement the model in the paper “Sequence Level Contrastive Learning for Text Summarization”.
As the paper mentioned, the architecture of two f
is the same, but only updates the parameters θ
in fθ
, and the parameters ξ
in fξ
is the moving average of θ
.
I want to know:
- whether or not parameters
ξ
should be calculated after every backward update the parametersθ
- how to write codes to implement the moving average update of parameters
ξ
after I setrequire_grad=False
forξ