Slow training with nn.DataParallel

I am training an image segmentation model and the time taken per each step in epoch seems to be rising throughput whole epoch. I am not sure if it has something to do with the way I am feeding my model with images. I am using DataLoader utility. I have pasted time for one epoch. Is there anyway to improve the training speed?

Epoch: 0 [>              ] 1.9% Mean Dice: 0.0227, time: 10.9979s
[Training] Epoch: 0 [>              ] 3.8%  time: 12.0363s
[Training] Epoch: 0 [>              ] 5.7%  time: 13.0289s
[Training] Epoch: 0 [=>             ] 7.5%  time: 14.0112s
[Training] Epoch: 0 [=>             ] 9.4%  time: 15.8508s
[Training] Epoch: 0 [=>             ] 11.3% time: 16.8213s
[Training] Epoch: 0 [=>             ] 13.2%  time: 17.7810s
[Training] Epoch: 0 [==>            ] 15.1%  time: 18.7687s
[Training] Epoch: 0 [==>            ] 17.0%  time: 19.7607s
.
.
.

[Training] Epoch: 0 [==============>] 96.2% time: 60.4288s
[Training] Epoch: 0 [==============>] 98.1% time: 61.3712s

Are you only seeing this slowdown using nn.DataParallel or also a single GPU?
In the latter case, are you also seeing an increased memory usage?
If that’s the case, you might accidentally store some tensors which are still attached to the computation graph, or somehow extend the computation graph in each iteration.

Thanks I was saving some tensors in loop.