Multithreading in dataloader workers

I’m doing training on data where the collate() function needs relatively heavy computation (some sequence packing).

Right now I am training with around 40 dataloader workers, but still experiencing locks as the main thread waits for data.

I noticed the workers each call torch.set_num_threads(1). Is there a reason for that (apart from limiting the number of threads). Is it ok to raise the number of threads each worker can use? (by eg calling torch.set_num_threads in the worker_init_fn)?