So I have created a class TrainEpoch
to train a single epoch , where I defined two functions for the forward and the backward pass as follows:
class TrainEpoch():
....
....
....
def Forward(self,x,y):
self.pre_logits = self.model(x)
self.pre_logits = _s(self.pre_logits,self.device) #send logits to device
self.curr_loss = self.loss_func(self.pre_logits, y)
def Backward(self):
self.curr_loss.backward()
self.optimizer.step()
self.optimizer.zero_grad()
This class is working fine, however I created a subclass TrainEpoch_AMP
from the above one to use automatic mixed precision training, and set the forward and backward pass function as follows:
import torch.cuda.amp as AMP
scaler = AMP.GradScaler()
class TrainEpochAMP(TrainEpoch):
....
self.scaler = scaler
....
....
@AMP.autocast()
def Forward(self, x, y):
super().Forward(x, y)
def Backward(self):
self.scaler.scale(self.curr_loss).backward()
self.scaler.step(self.optimizer)
self.scaler.update()
self.optimizer.zero_grad()
when i try to use amp training , the time needed to complete one epoch becomes 22hours while it is only 30 minutes without amp…
After inspection, i found that self.scaler.step(self.optimizer)
needs ~17 seconds to execute!!!
what could be the problem?