-
You should see the best performance using an optimal number of workers. Increasing the number of workers beyond the number of cores might yield bad performance as explained here.
-
torch.set_num_threads
should set the number of threads for intraop parallelism on the CPU, so for MKL and OpenMP etc., if I’m not mistaken. -
Yes, I think loading from disc with too many processes might reduce the performance significantly e.g. due to threshing. I don’t know, if this could explain your hang, but I would try to reduce the number of workers etc. to “common” values.