I was trying to train my network using apex mixed precision. I’ve tried DenseNet and Resnet as backbones for a segmentation task using CityScapes. Unfortunately, when I try to synchronize the batch norm using
scale_loss ends up being zero after a few iterations because of gradient overflow. This does not happen if I remove the batch normalization.
The snipet of my code is the following:
model.cuda(gpu) model = apex.parallel.convert_syncbn_model(model) optimizer = optim.Adam(model.parameters()) model, optimizer = apex.amp.initialize(model, optimizer) #model = apex.parallel.convert_syncbn_model(model) #I also tried to put it here net = DDP(model, delay_allreduce=True)
loss = Cross_entropy(y_pred, y_gt) with apex.amp.scale_loss(loss, optimizer) as scaled_loss: scaled_loss.backward() optimizer.step() optimizer.zero_grad()
OS: Ubuntu 16.04 and 18.04
Pytorch: tried with 1.4, 1.5 and .1.6
Did anyone experience the same?