I am training an object detection model (draw bounding box around the target object and tell me what class it is) with a very common training structure like this below:
for i, (inputs, labels) in enumerate(data_loader): optimizer.zero_grad() inputs = inputs.to(device) labels = labels.to(device) preds = model(inputs) loss = criterion(preds, labels) loss.backward() optimizer.step()
The training is very time-consuming.
And due to hardware limitations, I found some strategies that can speed up my training process so as to reduce training time. I have tried it but the accuracy is worse than that of before implementing it.
(Before implementation, though the predicted class in incorrect, the bounding box can draw at an approximately correct position. But after implementation, both are incorrect.)
These are the strategies I tried to implement – 1) enable
num_workers, 2) set
pin_memory=True, 3) use cuDNN Autotuner, 4) use Automatic Mixed Precision (AMP), 5) use gradient accumulation
Below is the modified code:
accumulation_steps = 4 torch.backends.cudnn.benchmark = True data_loader = DataLoader(..., num_workers=num_workers, pin_memory=True) for i, (inputs, labels) in enumerate(data_loader): optimizer.zero_grad() inputs = inputs.to(device) labels = labels.to(device) with autocast(): preds = model(inputs) loss = criterion(preds, labels) loss = loss/accumulation_steps scaler.scale(loss).backward() if ((i+1) % accumulation_steps == 0) or (i+ 1 == len(data_loader)): scaler.step(optimizer) scaler.update()
The strategies are only to speed up the training process, it should have no impact on the accuracy of the model. But I really have no idea why the performance is getting so worse after that. Can someone help me out with this?