In the implementation of RMSprop, there is a line as follows
square_avg = state['square_avg']
square_avg.mul_(alpha).addcmul_(1 - alpha, grad, grad)
...
if group['centered']:
grad_avg = state['grad_avg']
grad_avg.mul_(alpha).add_(1 - alpha, grad)
avg = square_avg.addcmul(-1, grad_avg, grad_avg).sqrt().add_(group['eps'])
else:
avg = square_avg.sqrt().add_(group['eps'])
Won’t the operation avg = square_avg.sqrt().add_(group['eps'])
update the variable in state['square_avg']
in-place? Shouldn’t we instead use ..add(group['eps'])
?
EDIT : I see, the .sqrt() operator assigns new memory. nevermind.