We’ve seen many research papers comparing FLOPS of different models. I just want to understand clearly how the hand calculation is done by convention.
For example, I’m multiplying a MxN matrix with a Nx1 vector.
Each matrix row takes dot product with the vector to produce a single scalar value in final output, which costs N multiplications and N-1 additions.
Repeating for all M matrix rows, we have MxN multiplications and Mx(N-1) additions.
In this example, do we count the total FLOPS as Mx(2N-1) (additions & multiplications), or MxN (multiply-add operations) ?
Thank you very much !