How to get the arithmetic workload of GPU?

In my code, i need to analyze my code and get the arithmetic workload of FT32, FT64 and others, how to get the arithmetic workload of GPU? What software or toolkit can i use? Like this picture:image