Coming to this rather late, but in case people are interested:
It is possible to directly measure the floating point operation count of models directly using CPU performance monitoring units as an alternative to the approaches which track the FLOPS of each operation. Using the python-papi module this is quite easy to do and the results match the operation counting as implemented by the thop module: see http://www.bnikolic.co.uk/blog/python/flops/2019/10/01/pytorch-count-flops.html for a comparison