How long should it take to train the mobilenet model on a 1080Ti?

So I’m training a mobilenet model available in torchvision using a 1080Ti and CUDA 9. The total size of my dataset is roughly ~4000 images

With minibatches of 16, it’s taking me ~4 minutes for training and ~1 minute for validation per epoch. Even with minibatches of size 64, I only see a marginal improvement in the time.

Is this an expected amount of time? It feels rather slow to me for a 1080Ti.

Whats your image size? 256x256?

And do you use any DataLoader or do some data processing in the loop?