Hey,
this is very interesting. In my code I also observer utilization up to 200% on the server machine, whereas my local machine does the job properly. So I figure it is something related with the OS or some environment?
See my question here DataLoader CPU utilization and slow training