Optimizer.step() -- ok; scaler.step(optimizer): No inf checks were recorded for this optimizer

I am getting AssertionError: No inf checks were recorded for this optimizer. in "/torch/cuda/amp/grad_scaler.py", line 291 when mixed-precision is used in this weird example below. However, if no mixed-precision is used pytorch doesn’t complain (toggle USE_HALF_PRECISION = True).

I am using PyTorch 1.6.0 (python 3.7, cuda 10.2.89, cudnn 7.6.5. – everything is in conda binaries). Here is the MWE.

import torch
from torch import nn
from torch.cuda.amp.autocast_mode import autocast
from torch.cuda.amp.grad_scaler import GradScaler

class Identity_with_weights(nn.Module):
    '''For example a KNN algorithm which returns a closest entry from a database for x. Weights are needed
    for a seamless inclusion of knn baseline to a set of baseline which do have some parameters. Otherwise
    you would need to change the code (remove optimizer, backward pass etc) just for knn which is not neat.'''
    def __init__(self):
        super(Identity_with_weights, self).__init__()
        self.__hidden__ = torch.nn.Linear(1, 1, bias=False)

    def forward(self, x):
        # we need it to be able to call backward on the loss which uses x (outputs).
        # Nothing will happen in these examples as it propagates to the input which is not used anywhere else
        x.requires_grad = True
        return x


if __name__ == "__main__":
    # config
    USE_HALF_PRECISION = True
    device = torch.device('cuda:0')

    # define model
    model = Identity_with_weights()

    # define training things
    criterion = nn.L1Loss()
    optimizer = torch.optim.Adam(model.parameters())

    # for amp
    scaler = GradScaler()

    # targets are exactly the same as inputs, i.e. for reconstruction
    inputs = torch.rand(8, 1)
    targets = inputs.clone().detach()

    # send to device
    model = model.to(device)
    inputs = inputs.to(device)
    targets = targets.to(device)

    # we don't need it for the sake of this example, but let's have it here anyway.
    optimizer.zero_grad()

    # since outputs are going to be f16 and targets are f32, criterion will output non zero loss
    if USE_HALF_PRECISION:
        targets = targets.half()
        inputs = inputs.half()

    # autocasting ops inside of the context manager
    with autocast(USE_HALF_PRECISION):
        outputs = model(inputs)
        loss = criterion(outputs, targets)

    print(loss)

    # scaling loss if using half precision
    if USE_HALF_PRECISION:
        scaler.scale(loss).backward()
        scaler.step(optimizer) ## ERROR HERE
        scaler.update()
    else:
        loss.backward()
        optimizer.step()

I think I am doing something wrong here. What does it complain about?

Scaler (optimizer) looks for parameters used in the graph which is empty, hence, the error.

This specific example was solved by replacing
x.requires_grad = True
with
x = x + 0 * self.__hidden__(x)

1 Like