Unstable data loading time when training imagenet

I’m using examples/main.py at main · pytorch/examples · GitHub to train imagenet, and found that the time it takes to load data varies greatly from batch to batch. For example (accuracy is not an issue as I’m using a toy model):

Epoch: [36][ 1/2503] Time 3.500 ( 3.500) Data 3.461 ( 3.461) Loss 4.0810e+00 (4.0810e+00) Acc@1 23.05 ( 23.05) Acc@5 42.19 ( 42.19)
Epoch: [36][ 101/2503] Time 2.770 ( 0.771) Data 2.737 ( 0.721) Loss 3.8523e+00 (3.8462e+00) Acc@1 25.00 ( 25.54) Acc@5 47.46 ( 46.81)
Epoch: [36][ 201/2503] Time 2.698 ( 0.754) Data 2.662 ( 0.706) Loss 3.9250e+00 (3.8569e+00) Acc@1 25.00 ( 25.34) Acc@5 45.90 ( 46.66)
Epoch: [36][ 301/2503] Time 1.371 ( 0.749) Data 1.335 ( 0.703) Loss 3.8588e+00 (3.8567e+00) Acc@1 27.15 ( 25.28) Acc@5 46.29 ( 46.59)
Epoch: [36][ 401/2503] Time 0.055 ( 0.747) Data 0.000 ( 0.701) Loss 3.8356e+00 (3.8576e+00) Acc@1 24.41 ( 25.31) Acc@5 48.24 ( 46.53)
Epoch: [36][ 501/2503] Time 0.052 ( 0.746) Data 0.000 ( 0.698) Loss 3.9134e+00 (3.8572e+00) Acc@1 26.37 ( 25.31) Acc@5 45.90 ( 46.54)
Epoch: [36][ 601/2503] Time 0.055 ( 0.746) Data 0.000 ( 0.698) Loss 3.8652e+00 (3.8538e+00) Acc@1 22.85 ( 25.33) Acc@5 41.60 ( 46.60)
Epoch: [36][ 701/2503] Time 0.054 ( 0.746) Data 0.000 ( 0.698) Loss 3.8961e+00 (3.8537e+00) Acc@1 26.95 ( 25.33) Acc@5 45.90 ( 46.56)
Epoch: [36][ 801/2503] Time 0.400 ( 0.745) Data 0.363 ( 0.697) Loss 3.7074e+00 (3.8524e+00) Acc@1 26.17 ( 25.35) Acc@5 49.61 ( 46.59)
Epoch: [36][ 901/2503] Time 0.055 ( 0.745) Data 0.000 ( 0.697) Loss 3.8253e+00 (3.8525e+00) Acc@1 26.95 ( 25.33) Acc@5 48.83 ( 46.58)
Epoch: [36][1001/2503] Time 0.055 ( 0.744) Data 0.000 ( 0.697) Loss 3.9177e+00 (3.8517e+00) Acc@1 22.85 ( 25.33) Acc@5 46.48 ( 46.60)
Epoch: [36][1101/2503] Time 0.050 ( 0.744) Data 0.000 ( 0.696) Loss 3.9132e+00 (3.8531e+00) Acc@1 25.39 ( 25.32) Acc@5 46.09 ( 46.59)
Epoch: [36][1201/2503] Time 0.053 ( 0.744) Data 0.000 ( 0.697) Loss 3.8585e+00 (3.8551e+00) Acc@1 25.59 ( 25.29) Acc@5 49.02 ( 46.57)
Epoch: [36][1301/2503] Time 0.667 ( 0.744) Data 0.635 ( 0.697) Loss 3.7034e+00 (3.8554e+00) Acc@1 26.17 ( 25.29) Acc@5 48.24 ( 46.58)
Epoch: [36][1401/2503] Time 0.938 ( 0.744) Data 0.906 ( 0.698) Loss 4.0079e+00 (3.8570e+00) Acc@1 23.63 ( 25.26) Acc@5 44.53 ( 46.52)
Epoch: [36][1501/2503] Time 1.276 ( 0.745) Data 1.244 ( 0.699) Loss 3.9660e+00 (3.8583e+00) Acc@1 25.39 ( 25.24) Acc@5 43.55 ( 46.51)
Epoch: [36][1601/2503] Time 1.059 ( 0.745) Data 1.026 ( 0.699) Loss 3.8683e+00 (3.8577e+00) Acc@1 25.59 ( 25.26) Acc@5 47.66 ( 46.53)
Epoch: [36][1701/2503] Time 1.766 ( 0.745) Data 1.735 ( 0.699) Loss 3.7440e+00 (3.8578e+00) Acc@1 28.12 ( 25.26) Acc@5 49.41 ( 46.53)
Epoch: [36][1801/2503] Time 2.152 ( 0.745) Data 2.117 ( 0.699) Loss 3.8291e+00 (3.8585e+00) Acc@1 25.39 ( 25.24) Acc@5 47.46 ( 46.53)
Epoch: [36][1901/2503] Time 2.174 ( 0.745) Data 2.136 ( 0.699) Loss 3.6864e+00 (3.8585e+00) Acc@1 26.56 ( 25.24) Acc@5 50.78 ( 46.53)
Epoch: [36][2001/2503] Time 1.734 ( 0.744) Data 1.703 ( 0.699) Loss 3.9377e+00 (3.8581e+00) Acc@1 25.98 ( 25.25) Acc@5 45.51 ( 46.53)
Epoch: [36][2101/2503] Time 1.108 ( 0.744) Data 1.075 ( 0.699) Loss 3.9033e+00 (3.8585e+00) Acc@1 23.83 ( 25.24) Acc@5 46.88 ( 46.53)
Epoch: [36][2201/2503] Time 0.054 ( 0.744) Data 0.000 ( 0.699) Loss 3.9226e+00 (3.8582e+00) Acc@1 26.56 ( 25.23) Acc@5 45.31 ( 46.53)
Epoch: [36][2301/2503] Time 0.143 ( 0.744) Data 0.110 ( 0.699) Loss 3.8114e+00 (3.8589e+00) Acc@1 25.78 ( 25.22) Acc@5 48.05 ( 46.52)
Epoch: [36][2401/2503] Time 1.059 ( 0.744) Data 1.027 ( 0.699) Loss 3.7287e+00 (3.8593e+00) Acc@1 25.78 ( 25.22) Acc@5 48.44 ( 46.52)
Epoch: [36][2501/2503] Time 1.310 ( 0.744) Data 1.280 ( 0.699) Loss 3.7287e+00 (3.8596e+00) Acc@1 26.76 ( 25.23) Acc@5 47.07 ( 46.52)

Could someone suggest a reason? The dataset is on an SSD and my GPU is RTX 2080Ti. In iotop the disk read time is around 80MB/s. batch size = 512, workers=4