Slow training using LMDB dataset and nn.DataParallel

Hi everyone, I am training my segmentation model using a image set of 8146 training images. Training was taking a long time so I created LMDB data first and then trained my model but I did not see any decrease in training speed. I am also using nn.DataParallel. Any idea how can I improve the training speed? Each epoch takes more than 20 minutes :confused: … I have experience with training models on tensorflow using tf.records and it was quite fast.

Can you see how much your GPUs are utilized using nvidia-smi. If they are not utilized fully it means the pre-processing (could be reading time from hard-drive) is taking much more time to utilize the GPU properly.

I dont think nvidia-smi gives a precise picture of GPU utilization. It just tells the GPU utilization at the very moment the model is.

Hmmm…you can use watch command

watch -n 0.1 nvidia-smi

or use dmon option

nvidia-smi dmon

will refresh the utilization every fraction of second or so