I just stumbled upon the new Data2Vec Paper of Meta AI.
In section 3.3 the authors describe how the parameter updates for the so called teacher model is done. If I got it right, the parameters get updated according to
W_t = (1-k) * W_t + k * W_s,
with W_t being the teacher model’s parameters, W_s being the student model’s parameters and k being a constant factor.
I tried to reimplement this parameter update. In order to do this, I have to multiply all parameters of the teacher Model (W_t) with the constant factor (1-k). The only solution I found to this, was the following (from this post here):
for name, param in state_dict.items(): # Transform the parameter as required. transformed_param = param * 0.9 # Update the parameter. state_dict[name].copy_(transformed_param)
However, this looks highly inefficient to me, as this is no vectorized operation. Is there a faster way to do so?
Thanks for your help!