The placeholder to be memory efficient

blackbirdbarber · September 27, 2019, 6:16pm

I have found AdamW by LiyuanLucasLiu.

If I compare the implementation with the Adam, one thing is that I wonder…

Why AdamW implmentation used p_data_fp32 = p.data.float() and later on p.data.copy_(p_data_fp32).

Is this the placeholder trick for the optim to be memory efficient?

Will this improve the original Adam implementation, or this is not needed?