I looked in more carefully, their implementation does not account for several modules like nn.Embedding
and LayerNorm
, GeLU()
etc. These approaches may not give an accurate count of FLOPS. I’m referring to the ones Soumith recommended here
I looked in more carefully, their implementation does not account for several modules like nn.Embedding
and LayerNorm
, GeLU()
etc. These approaches may not give an accurate count of FLOPS. I’m referring to the ones Soumith recommended here