VGGPerceptualLoss in mixed precision mode

When i try to train my model using the nightly auto mixed precision mode i get this error:

Traceback (most recent call last):
  File "src/", line 101, in <module>
  File "src/", line 48, in init_phase
    loss = content_loss(fakes, images)
  File "E:\program\anaconda\envs\torch_n\lib\site-packages\torch\nn\modules\", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\C_GAN\src\", line 92, in forward
    l = self.base_loss(x, y)
  File "E:\program\anaconda\envs\torch_n\lib\site-packages\torch\nn\modules\", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "E:\program\anaconda\envs\torch_n\lib\site-packages\torch\nn\modules\", line 813, in forward
    return F.smooth_l1_loss(input, target, reduction=self.reduction)
  File "E:\program\anaconda\envs\torch_n\lib\site-packages\torch\nn\", line 2581, in smooth_l1_loss
    ret = _smooth_l1_loss(input, target)
  File "E:\program\anaconda\envs\torch_n\lib\site-packages\torch\nn\", line 2557, in _smooth_l1_loss
    return torch.where(t < 1, 0.5 * t ** 2, t - 0.5)
RuntimeError: expected scalar type float but found struct c10::Half

The remaining code is quite similar to that of

The loss is found here:

How should i change the loss to make it work correctly?

I cannot reproduce this issue using your criterion and this minimal code snippet:

criterion = VGGPerceptualLoss().cuda()

x = torch.randn(1, 3, 24, 24, device='cuda', requires_grad=True)
y = torch.randn(1, 3, 24, 24).cuda()

with torch.cuda.amp.autocast():
    loss = criterion(x, y)


Could you check my code and compare it to yours, which is raising this issue?

Thanks for the reply.
Upon rerunning the code from op the code does not crash for unknown reasons,
but when we apply autocast it produces unexpected results.

This is the essence of the training and evaluation code.

scaler = torch.cuda.amp.GradScaler()
for epoch in range(epochs):
    for images in train_photo_dataloader:
        images =


        with torch.cuda.amp.autocast():
            fakes = G(images)
            loss = content_loss(fakes, images)



    with torch.no_grad():
        fakes = G(test_images)
        loss_test = content_loss(fakes, test_images)

        logger.log_images("Test images output", fakes, normalize=False)

With the logger using make_grid like this

class ExperimentLogger(object):
    def __init__(self, exp_id: str, init_phase: bool):
        self.global_step = 0
        self.exp_id = exp_id
        log_dir = f"./logs/{exp_id}/{'pre_training' if init_phase else 'main'}"
        self.writer = SummaryWriter(log_dir=log_dir)

    def log_images(self, tag: str, tensor: torch.Tensor, normalize=False):
        tensor = vutils.make_grid(tensor.cpu(), normalize=normalize)
        # TODO Maybe check here what phase we are in and set 'tag' accordingly.
        self.writer.add_image(tag, tensor, self.global_step)

G is a Unet defined using segmentation_models.pytorch

smp.Unet(encoder_name="resnet18", encoder_depth=5, encoder_weights="imagenet",
                         decoder_channels=(32, 32, 32, 32, 16),
                         decoder_use_batchnorm=True, in_channels=3, classes=3, activation="sigmoid")

When using autocast this is our result

compared to not not wrapping the two code lines in with torch.cuda.amp.autocast():

We are simply trying to recreate the following images

Another problem is that the produced are oversaturated, compared to the originals.

Is this issue reproducible, i.e. if you are rerunning the training script several times, the amp training fails to converge, while the “standard” training yields these oversaturated images?

I redid the experiment using a sample of 600 images, where each iteration where ran for 5 epochs,
here is a few samples of the output of each case, 5 experiment where done for both and with and without, and i took 2 random samples from both to display the difference. It seem to be reproducible.

with torch.cuda.amp.autocast()


without amp:

1 Like

I fixed the oversaturation by changing how i did the inv_norm. The fix me and @ZimoNitrome did can be seen in here, mixed precision still produces noise and fail to converge.

I haven’t seen these failures before and other GAN arhcitectures, such as Pix2PixHD don’t seem to need any changes additional changes for mixed-precision training.
CC @mcarilli in case he’s seen some GAN failures before.