Hi there, I am doing a model whose input is videos. I use a large batch size (96)(therefore many .jpg files should be loaded, say 12288 pictures) and 32 workers in Dataloader
. So I find an interesting phenomenon: that most of the time is spent at the beginning of each epoch (one epoch mean a complete scan of the dataset), like this:
|Epoch: [6][1/11], lr: 0.00100|Time 81.513 (16.666)|Data 79.471 (14.021)|Loss 1.1498 (1.0695)|Prec@1 65.625 (72.222)|Prec@5 89.583 (91.257)|
|---|---|---|---|---|---|
|Epoch: [6][2/11], lr: 0.00100|Time 3.260 (15.635)|Data 1.233 (13.038)|Loss 0.8580 (1.0525)|Prec@1 77.083 (72.613)|Prec@5 90.625 (91.206)|
|Epoch: [6][3/11], lr: 0.00100|Time 2.791 (14.717)|Data 0.000 (12.106)|Loss 0.9509 (1.0449)|Prec@1 69.792 (72.403)|Prec@5 90.625 (91.163)|
|Epoch: [6][4/11], lr: 0.00100|Time 2.658 (13.913)|Data 0.000 (11.299)|Loss 1.3641 (1.0670)|Prec@1 63.542 (71.789)|Prec@5 87.500 (90.909)|
|Epoch: [6][5/11], lr: 0.00100|Time 2.674 (13.211)|Data 0.000 (10.593)|Loss 0.8397 (1.0523)|Prec@1 78.125 (72.200)|Prec@5 92.708 (91.026)|
|Epoch: [6][6/11], lr: 0.00100|Time 2.638 (12.589)|Data 0.000 (9.970)|Loss 0.7121 (1.0316)|Prec@1 82.292 (72.814)|Prec@5 95.833 (91.318)|
|Epoch: [6][7/11], lr: 0.00100|Time 2.647 (12.037)|Data 0.000 (9.416)|Loss 1.0643 (1.0335)|Prec@1 71.875 (72.760)|Prec@5 89.583 (91.219)|
|Epoch: [6][8/11], lr: 0.00100|Time 2.850 (11.553)|Data 0.000 (8.921)|Loss 0.8783 (1.0250)|Prec@1 77.083 (72.994)|Prec@5 92.708 (91.299)|
|Epoch: [6][9/11], lr: 0.00100|Time 2.766 (11.114)|Data 0.000 (8.475)|Loss 1.1610 (1.0320)|Prec@1 68.750 (72.776)|Prec@5 88.542 (91.158)|
|Epoch: [6][10/11], lr: 0.00100|Time 2.663 (10.711)|Data 0.000 (8.071)|Loss 0.9689 (1.0289)|Prec@1 75.000 (72.885)|Prec@5 92.708 (91.233)|
|Epoch: [6][11/11], lr: 0.00100|Time 1.609 (10.298)|Data 0.000 (7.704)|Loss 1.5306 (1.0395)|Prec@1 64.286 (72.705)|Prec@5 83.333 (91.068)|
|Epoch: [7][1/11], lr: 0.00100|Time 67.395 (12.780)|Data 65.243 (10.206)|Loss 1.1724 (1.0455)|Prec@1 66.667 (72.429)|Prec@5 87.500 (90.905)|
|Epoch: [7][2/11], lr: 0.00100|Time 19.035 (13.041)|Data 16.957 (10.487)|Loss 0.9826 (1.0428)|Prec@1 69.792 (72.313)|Prec@5 93.750 (91.029)|
|Epoch: [7][3/11], lr: 0.00100|Time 2.656 (12.625)|Data 0.000 (10.068)|Loss 0.9412 (1.0385)|Prec@1 76.042 (72.469)|Prec@5 93.750 (91.143)|
|Epoch: [7][4/11], lr: 0.00100|Time 2.664 (12.242)|Data 0.000 (9.680)|Loss 1.1080 (1.0413)|Prec@1 77.083 (72.655)|Prec@5 91.667 (91.164)|
|Epoch: [7][5/11], lr: 0.00100|Time 2.791 (11.892)|Data 0.000 (9.322)|Loss 0.9605 (1.0382)|Prec@1 75.000 (72.746)|Prec@5 96.875 (91.385)|
|Epoch: [7][6/11], lr: 0.00100|Time 2.657 (11.562)|Data 0.000 (8.989)|Loss 1.1846 (1.0437)|Prec@1 71.875 (72.713)|Prec@5 85.417 (91.163)|
|Epoch: [7][7/11], lr: 0.00100|Time 2.818 (11.261)|Data 0.000 (8.679)|Loss 0.9796 (1.0414)|Prec@1 69.792 (72.608)|Prec@5 93.750 (91.256)|
|Epoch: [7][8/11], lr: 0.00100|Time 2.671 (10.974)|Data 0.000 (8.390)|Loss 0.9922 (1.0397)|Prec@1 69.792 (72.511)|Prec@5 90.625 (91.234)|
|Epoch: [7][9/11], lr: 0.00100|Time 2.671 (10.707)|Data 0.000 (8.119)|Loss 1.1123 (1.0421)|Prec@1 66.667 (72.315)|Prec@5 93.750 (91.318)|
|Epoch: [7][10/11], lr: 0.00100|Time 2.674 (10.456)|Data 0.000 (7.865)|Loss 1.0186 (1.0413)|Prec@1 73.958 (72.368)|Prec@5 92.708 (91.363)|
|Epoch: [7][11/11], lr: 0.00100|Time 1.597 (10.187)|Data 0.000 (7.627)|Loss 1.2137 (1.0437)|Prec@1 66.667 (72.289)|Prec@5 90.476 (91.351)|
|Epoch: [8][1/11], lr: 0.00100|Time 94.021 (12.653)|Data 91.101 (10.082)|Loss 0.9634 (1.0412)|Prec@1 77.083 (72.437)|Prec@5 92.708 (91.393)|
|Epoch: [8][2/11], lr: 0.00100|Time 10.186 (12.582)|Data 8.128 (10.026)|Loss 0.9580 (1.0387)|Prec@1 78.125 (72.608)|Prec@5 89.583 (91.338)|
|Epoch: [8][3/11], lr: 0.00100|Time 2.751 (12.309)|Data 0.001 (9.748)|Loss 1.0534 (1.0392)|Prec@1 72.917 (72.617)|Prec@5 89.583 (91.287)|
|Epoch: [8][4/11], lr: 0.00100|Time 2.665 (12.049)|Data 0.000 (9.484)|Loss 1.1317 (1.0418)|Prec@1 66.667 (72.448)|Prec@5 92.708 (91.327)|
|Epoch: [8][5/11], lr: 0.00100|Time 2.657 (11.801)|Data 0.000 (9.235)|Loss 1.0215 (1.0412)|Prec@1 70.833 (72.404)|Prec@5 94.792 (91.423)|
|Epoch: [8][6/11], lr: 0.00100|Time 2.814 (11.571)|Data 0.000 (8.998)|Loss 1.2059 (1.0457)|Prec@1 70.833 (72.362)|Prec@5 91.667 (91.429)|
|Epoch: [8][7/11], lr: 0.00100|Time 2.774 (11.351)|Data 0.001 (8.773)|Loss 0.9423 (1.0430)|Prec@1 72.917 (72.376)|Prec@5 90.625 (91.408)|
|Epoch: [8][8/11], lr: 0.00100|Time 2.672 (11.139)|Data 0.000 (8.559)|Loss 0.9794 (1.0413)|Prec@1 66.667 (72.231)|Prec@5 90.625 (91.388)|
|Epoch: [8][9/11], lr: 0.00100|Time 2.666 (10.938)|Data 0.000 (8.355)|Loss 0.9243 (1.0384)|Prec@1 73.958 (72.274)|Prec@5 89.583 (91.344)|
|Epoch: [8][10/11], lr: 0.00100|Time 2.671 (10.745)|Data 0.000 (8.161)|Loss 1.2551 (1.0437)|Prec@1 72.917 (72.289)|Prec@5 90.625 (91.326)|
|Epoch: [8][11/11], lr: 0.00100|Time 1.616 (10.538)|Data 0.000 (7.976)|Loss 0.8046 (1.0412)|Prec@1 66.667 (72.231)|Prec@5 100.000 (91.417)|
As you can see, the training procedure take marginal time, while most of the time spent at the beginning of an epoch.
Is that an expected phenomena?