Number of threads when using DataParallel

Hi. When I use nn.DataParallel for training my model, I have a problem with the number of threads.

Even though I set then number of threads for the process (by using torch.set_num_threads), it takes a lot of threads.
I checked that the torch.set_num_threads works well for using single gpu.

Is there any setting that I missed? How can I deal with the problem?

Thank you

Are you sure these threads are used by nn.DataParallel or could e.g. the DataLoader spawn new workers in your setup?