Dataloader slowing down with each batch

I have an image dataset of 30k images and I have loaded a previously trained Resnet50 model. My goal is to get gradients of each input sample individually for the last layer thus my batch size = 1.

Now, here’s my problem. Over the 30k batches, the dataloader slows down considerably. The first 10k batches are loaded and processed (forward pass + gradients for the last layer computed) in around 3 minutes but the next 10k take around 10 minutes and the remaining ten take about 20 minutes to load.

I am resizing and normalizing the image using PyTorch Transforms.

For reference, I am on a GPU cluster with Nvidia A100s. The machine has 8 such GPUs and none of them are being used. The dataset is being pulled in from an NFS server but the problem persists even when I moved the dataset to my local directory. I also resized my images before running my model so that I don’t have to do it on the go but the problem still persists.

Here’s another finding. If I break my dataloader at 10k iterations and then restart it again, the speed is again pretty good (3 minutes / 10k) so I suspect it has something to do with the dataloader somehow taking up memory.

I would really appreciate any help on this matter.

Thank you!