Calculation of flops during checkpoint activation of fsdp

Taishi_Nakamura · January 6, 2024, 8:13pm

How to calculate the calculation of flops during checkpoint activation of fsdp
In particular, the model of GPT series

For example, in megatron lm, when the activation is selective, an additional calculation of 4Bs^2h is required
What happens in the case of fsdp?