Multiprocessing on CPUs

Hello,

I have a model that I trained on GPUs. Now, I want to use it on my test dataset.
The dataset is large, and what I would like to do is use python multiprocessing to load each image, make patches from them, and forward them to the model on CPUs.

The problem is that I get different speeds when I use a different number of CPUs.
My questions are:

  1. Does the number of CPUs affects the model?
  2. I init() and load the model once, and then let’s say I use 48 CPUs which means 48 images simultaneously, so how that one model predicts 48 patches at same time?
  3. What are the differences between TORCH.MULTIPROCESSING and python multiprocessing? Now, I use the latter one, should I move to the first one?
  4. What are the differences between TORCH.MULTIPROCESSING and nn.DataParallel?

I found this issue, but I am not sure if they are related!

Thanks,

Update:

When I run a single image with 9 CPUs, it finishes in 30 minutes.
But when I run 9 images with 9 CPUs (each image in a CPU), it takes about 21 hours !!

How can I solve this? I do not know the reason, but maybe if I copy the model on each CPU it will be faster? How should I do it?