Profiling pytorch scripts?

I made the same conclusion. As .item() has to wait for all CUDA operations to be completed, it’s a synchronization point. The timing is therefore not right and reflects the waiting for other ops to finish.

1 Like