I have to use a complicated data augmentation inside the dataloader (like https://github.com/yjxiong/tsn-pytorch/blob/master/dataset.py ).
trn_loader = DataLoader(train_data, batch_size=128, shuffle=True, pin_memory=True, num_workers=4)
As shown below, the time for cpu preparing data is much longer (20 times) than gpu training. The problem is the cpu usage is low, keeping between 1% and 5%
How should I change the parameters to keep cpus running at the high usage to save more time?
Epoch: [1][36/170] TotalTime(s) 6.600 DataTime(s) 6.450
Epoch: [1][37/170] TotalTime(s) 0.162 DataTime(s) 0.000
Epoch: [1][38/170] TotalTime(s) 0.912 DataTime(s) 0.823
Epoch: [1][39/170] TotalTime(s) 0.101 DataTime(s) 0.000
Epoch: [1][40/170] TotalTime(s) 6.662 DataTime(s) 6.577
Epoch: [1][41/170] TotalTime(s) 0.106 DataTime(s) 0.000
Epoch: [1][42/170] TotalTime(s) 1.084 DataTime(s) 0.991
Epoch: [1][43/170] TotalTime(s) 0.106 DataTime(s) 0.000
Epoch: [1][44/170] TotalTime(s) 6.280 DataTime(s) 6.197
Epoch: [1][45/170] TotalTime(s) 0.104 DataTime(s) 0.000
Epoch: [1][46/170] TotalTime(s) 0.159 DataTime(s) 0.072
Epoch: [1][47/170] TotalTime(s) 3.550 DataTime(s) 3.454
Epoch: [1][48/170] TotalTime(s) 9.151 DataTime(s) 9.057
GPU: p100
CPU: 8cores
RAM: 64GB
Data: 9GB
Pytorch: 0.4.1
PS: If setting worker=0, at the first 50 iterations, the cpu usage is 300%. After that the cpu usage oscillates between 0 and 200% slowly.