CPU bound DataLoader, can't increase throughput with multiprocessing (num_workers > 1)?

Hey,

this is very interesting. In my code I also observer utilization up to 200% on the server machine, whereas my local machine does the job properly. So I figure it is something related with the OS or some environment?

See my question here DataLoader CPU utilization and slow training