That’s an interesting observation!
Based on your comment:
When using the autocast (mixed precision) context manager, we can see that the CPU usage is mainly concentrated in a single core. When setting
use_autocast=False
however, the CPU usage is more spread across several cores.
it seems you are using autocast
on the CPU and then observe this behavior?
Also, it seems you are “mainly” seeing a single core usage but other cores are still used?