Hi,
I use a modest configuration of the dataloader:
batchsize = 64
n_workers = 8
dl = DataLoader(ds,
batch_size = batchsize,
num_workers = n_workers,
pin_memory = False,
drop_last = True
)
The ds
is some dataset object that uses cv2
to read and uses scipy
to do a few preprocessing. The images will be converted to tensors with transformations.to_tensor()
.
The problems is that, with my 600,000 validation images, an epoch of only iterating data (only read images without running through the model) takes about 7min. However, with my 3,800,000 training images, an epoch of only iterating data takes much longer than 7x6min. Besides, I got to find that the operation system seems to be a little slower when I iterates my training data. I store my training and validation sets each into 30 category folders. I believe I have no other strange operation. What is the cause of this please?