I use a modest configuration of the dataloader:
batchsize = 64 n_workers = 8 dl = DataLoader(ds, batch_size = batchsize, num_workers = n_workers, pin_memory = False, drop_last = True )
ds is some dataset object that uses
cv2 to read and uses
scipy to do a few preprocessing. The images will be converted to tensors with
The problems is that, with my 600,000 validation images, an epoch of only iterating data (only read images without running through the model) takes about 7min. However, with my 3,800,000 training images, an epoch of only iterating data takes much longer than 7x6min. Besides, I got to find that the operation system seems to be a little slower when I iterates my training data. I store my training and validation sets each into 30 category folders. I believe I have no other strange operation. What is the cause of this please?