I have a similar question here. I simultaneously opened a query in pytorch/fairseq#779 to which the response was that there is built in averaging.
How about trying some black box experiments to figure out?
I have a similar question here. I simultaneously opened a query in pytorch/fairseq#779 to which the response was that there is built in averaging.
How about trying some black box experiments to figure out?