Possible deadlock? Training example gets stuck

  1. You should see the best performance using an optimal number of workers. Increasing the number of workers beyond the number of cores might yield bad performance as explained here.

  2. torch.set_num_threads should set the number of threads for intraop parallelism on the CPU, so for MKL and OpenMP etc., if I’m not mistaken.

  3. Yes, I think loading from disc with too many processes might reduce the performance significantly e.g. due to threshing. I don’t know, if this could explain your hang, but I would try to reduce the number of workers etc. to “common” values.