Gpu utilization is less

DataLoader creates mini-batches with batch_size.

torch.utils.data.DataLoader(dataset, batch_size=100, shuffle=True)

Yes, right now I am using batch_size=300

I’d suggest doing some napkin math to look at the size of your data vs. the memory bandwidth that you have between CPU and GPU RAM. 3 hidden layers of 512 parameters is what, order a million operations per sample for the forward pass? Compare that to your memory needs to get the inputs and outputs to/from the GPU, and I wouldn’t be surprised if you’re just not doing enough to keep the GPU busy while the next batch of samples gets copied in.

Mr. Stevens has a good point here – OP, you don’t mention how many GPUs you have / what size they are (1 GB? 10 GB?) – 1% util means different things based on GPU size.

My advice –

  • Check what your swap levels are when the net isn’t running – it’s possible (although a bit weird…) something else is putting 17.6G on your swap
  • Preload the entire dataset as others have suggested – should be relatively easy
  • 10x (or 100x) the size of your net, and see what happens. Do you get OOM? You don’t mention memory usage, only util – if you’re already bumping up against memory usage limits, then it’s definitely some data loading issues. However, if you’re only using say, 1% of GPU memory, then your arithmetic intensity may be just too low to ever expect the GPU to actually hit high util. Holding 99% util can be tricky – especially with small nets.

Maybe stupid suggestion, but did you move your data and model to GPU?