In my experiment, I test different num of gpus and num_wokers in dataloader. Here is the result
num gpu | batch size | num worker | one batch time(get data and forward backward) | one batch time(get data) |
---|---|---|---|---|
2 | 512 | 128 | 0.8 | 0.3 |
2 | 512 | 32 | 0.8 | 0.3 |
2 | 512 | 16 | 0.8 | 0.3 |
2 | 512 | 8 | 0.8 | 0.3 |
2 | 512 | 2 | 0.8 | 0.3 |
2 | 512 | 1 | 3.7 | 3.2 |
8 | 512 | 32 | 0.3 | 0.08 |
8 | 1024 | 32 | 0.55 | 0.14 |
There are two weird thing. 1) is more gpu accelerate the speedup of dataloader. 2) when increase the num_worker, it doesn’t accelerate the speed of getting data.