Optimizer.step() -- ok; scaler.step(optimizer): No inf checks were recorded for this optimizer

Vladimir_Iashin · August 26, 2020, 12:19pm

I am getting AssertionError: No inf checks were recorded for this optimizer. in "/torch/cuda/amp/grad_scaler.py", line 291 when mixed-precision is used in this weird example below. However, if no mixed-precision is used pytorch doesn’t complain (toggle USE_HALF_PRECISION = True).

I am using PyTorch 1.6.0 (python 3.7, cuda 10.2.89, cudnn 7.6.5. – everything is in conda binaries). Here is the MWE.

import torch
from torch import nn
from torch.cuda.amp.autocast_mode import autocast
from torch.cuda.amp.grad_scaler import GradScaler

class Identity_with_weights(nn.Module):
    '''For example a KNN algorithm which returns a closest entry from a database for x. Weights are needed
    for a seamless inclusion of knn baseline to a set of baseline which do have some parameters. Otherwise
    you would need to change the code (remove optimizer, backward pass etc) just for knn which is not neat.'''
    def __init__(self):
        super(Identity_with_weights, self).__init__()
        self.__hidden__ = torch.nn.Linear(1, 1, bias=False)

    def forward(self, x):
        # we need it to be able to call backward on the loss which uses x (outputs).
        # Nothing will happen in these examples as it propagates to the input which is not used anywhere else
        x.requires_grad = True
        return x


if __name__ == "__main__":
    # config
    USE_HALF_PRECISION = True
    device = torch.device('cuda:0')

    # define model
    model = Identity_with_weights()

    # define training things
    criterion = nn.L1Loss()
    optimizer = torch.optim.Adam(model.parameters())

    # for amp
    scaler = GradScaler()

    # targets are exactly the same as inputs, i.e. for reconstruction
    inputs = torch.rand(8, 1)
    targets = inputs.clone().detach()

    # send to device
    model = model.to(device)
    inputs = inputs.to(device)
    targets = targets.to(device)

    # we don't need it for the sake of this example, but let's have it here anyway.
    optimizer.zero_grad()

    # since outputs are going to be f16 and targets are f32, criterion will output non zero loss
    if USE_HALF_PRECISION:
        targets = targets.half()
        inputs = inputs.half()

    # autocasting ops inside of the context manager
    with autocast(USE_HALF_PRECISION):
        outputs = model(inputs)
        loss = criterion(outputs, targets)

    print(loss)

    # scaling loss if using half precision
    if USE_HALF_PRECISION:
        scaler.scale(loss).backward()
        scaler.step(optimizer) ## ERROR HERE
        scaler.update()
    else:
        loss.backward()
        optimizer.step()

I think I am doing something wrong here. What does it complain about?

Vladimir_Iashin · August 27, 2020, 7:31pm

Scaler (optimizer) looks for parameters used in the graph which is empty, hence, the error.

This specific example was solved by replacing
x.requires_grad = True
with
x = x + 0 * self.__hidden__(x)

StawEndl · January 13, 2022, 3:53am

i get the same troble and i do not know how to solute it. i will appreciate it if you can give me some suggestions.
here is the link: Yolov5-mask : No inf checks were recorded for this optimizer