Training gets slow down by each batch slowly

The answer comes from here - Why the training slow down with time if training continuously? And Gpu utilization begins to jitter dramatically?

I used torch.cuda.empty_cache() at end of every loop