Questions related to Custom DataLoader

Hello, I would like to ask you a question related to custom data loader.

I have completed writing to load custom data using pytorch’s DataLoader.

If i run the command “python --gpu=0”, it works fine.

But if i check it with “ps aux | grep python”, then i will see several pids problem as shown below.
(Multiple pids are allocated, even though i have executed one command in the terminal)

If i create a Random Dataset and test it, only one pid is allocated.

What’s the problem?

I would appreciate your help.

(Below is the source code used.)

– Dataset class


– Collate function

– Train function


When you create a data-loader, what ido you use for the number of workers num_workers?

As shown in the picture above, I set batch_size = 2 and num_workers = 8.

Ok, I could-not see the num_workers. But now, given that num_workers=8, explains why you there are 9 processes running. One process is the main process, which creates 8 other subprocesses for loading the data.

Thank you for your reply.

Well, there is no specific reason to set num_workers to 8, and I know that the more workers i know,
the faster I / O processes can handle them.
(I refer to the following article.)

I checked as you told me, and I’ve confirmed that the number of num_workers+1 (main process + subprocess num_workers) pid is allocated. Is the usage wrong?

The code looks fine. But whether or not you get any improved performance in I/O speed depends on the amount of work each worker has to do. For example, if the work load is not much, using num_workers=2 and num_workers=8 may have similar speed.

Oh, I see. Thank you for good information.
However, why is it that multiple pids are allocated even though one command is executed?


So, each of the subprocesses that are created by DataLoader will have their own PID.

Oh, yes. I just thought it was wrong to be made that way.

Because sub-process is created by workers, it is normal that several pids are allocated!

Thank you very much for your reply.