Hi,

I was trying to train my network using apex mixed precision. I’ve tried DenseNet and Resnet as backbones for a segmentation task using CityScapes. Unfortunately, when I try to synchronize the batch norm using `convert_syncbn_model`

, the `scale_loss`

ends up being zero after a few iterations because of gradient overflow. This does not happen if I remove the batch normalization.

The snipet of my code is the following:

```
model.cuda(gpu)
model = apex.parallel.convert_syncbn_model(model)
optimizer = optim.Adam(model.parameters())
model, optimizer = apex.amp.initialize(model, optimizer)
#model = apex.parallel.convert_syncbn_model(model) #I also tried to put it here
net = DDP(model, delay_allreduce=True)
```

…

```
loss = Cross_entropy(y_pred, y_gt)
with apex.amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
optimizer.zero_grad()
```

System:

OS: Ubuntu 16.04 and 18.04

Pytorch: tried with 1.4, 1.5 and .1.6

apex: 0.1

Did anyone experience the same?

Thanks