Training slow down as epoch progress

I have define some variables inside forward pass. Input is 3-dimension and its different channel is assigned to the different variable.

One channel is assigned as self.channel_1=self.input[:,0,:,:]

Second and third channel is assigned after initialization like below.
self.channel_2 = torch.ones(1,1,256,256).to(self.device)
self.channel_3 = torch.ones(1,1,256,256).to(self.device)

self.channel_2 = self.input[0,1,:,:]
self.channel_3= self.input[0,2,:,:]

As epoch is progressed, training slow down.

(1) What is the difference between two assignment above? Is it the cause of slow down?
(2) Can i use “torch.cuda.empty_cache()” at end of epoch to speed up next epoch?

I am using torch 1.4.0.

  1. I’m not sure which difference you are pointing to. But the issue sounds as if you are storing the computation graph in each iterations, which could slow down your code and would also be visible in an increasing memory usage.

  2. No, torch.cuda.empty_cache() will potentially only slow down your code further since synchronizing cudaMalloc calls would be needed to re-allocate the memory, and won’t save any memory.

torch.ones(1,1,256,256).to(self.device) is not created for “self.channel_1” before assignment while for “self.channel_2” and “self.channel_3” it is created before assignment.

Thanks for explaining this.
You could directly pass the device to the tensor creation: torch.ones(size, device=device), which might be faster than creating these values on the CPU and push them to the GPU.
Were you able to check the memory usage and did you see an increased usage?