Training becomes slower as time goes by

i am training my model(contains conv layers, fc layers ,bigru layers)on two gpus. at the beginning, every batch takes around 5s, but as time goes by , it takes 20s for every batch. and i don’t know why the training time for the same size data will increase.
i use model=nn.DataParallel(model,device_ids=[0,1]) to train my model on two gpus.
and i use torch.cuda.empty_cache() after every batch, but it doesn’t work out.

This would slow down your code as synchronizing calls are executed to free and reallocate the memory.

Are you seeing an increasing memory usage on the GPU(s)? If so, you might accidentally store the computation graph, which would increase the memory and runtime of your training, e.g. via total_loss += loss. If you store the loss or any other tensor, which is attached to the computation graph, you should .detach() it before.