Issue with automatic mixed precision

Without AMP

import torch
from torch.cuda.amp import autocast, GradScaler
from torchvision import models

model = models.mobilenet_v2(pretrained=True).cuda()
loss_fnc = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

X = torch.randn((32,3,300,300), dtype=torch.float32).cuda()
y = torch.randint(0, 1000, (32,), dtype=torch.long).cuda()

model.train()
for j in range(30):    
    optimizer.zero_grad()
    
    y_hat = model(X)
    loss = loss_fnc(y_hat, y)    
    loss.backward()
    optimizer.step()    
    print (loss.item())

Output:

8.039933204650879
5.690041542053223
3.4787116050720215
1.607206106185913
0.6231755614280701
0.23825135827064514
0.08544095605611801
0.04335329309105873
0.016259444877505302
0.01174827478826046
0.0069425650872290134
0.004459714516997337
0.003734807949513197
0.0024659112095832825
0.0027059323620051146
........................

But with AMP it gives nan’s

model = models.mobilenet_v2(pretrained=True).cuda()
loss_fnc = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

X = torch.randn((32,3,300,300), dtype=torch.float32).cuda()
y = torch.randint(0, 1000, (32,), dtype=torch.long).cuda()

scaler = GradScaler()
model.train()
for j in range(30):    
    optimizer.zero_grad()
    
    with autocast():
        y_hat = model(X)
        loss = loss_fnc(y_hat, y)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

    print (loss.item())

Output:

8.393239974975586
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
.......................

Thanks for the code snippet. I cannot reproduce the NaN outputs for the amp model and get:

7.946749687194824
7.837947368621826
8.126112937927246
5.472558498382568
3.0419869422912598
1.3717389106750488
0.4931238293647766
0.21829186379909515
0.09434405714273453
0.0426550917327404
0.021307891234755516
0.013564658351242542
0.010585908778011799
0.00768495025113225
0.005186968017369509
0.0036425101570785046
0.003632057225331664
0.0027618384920060635
0.0024199967738240957
0.00201830524019897
0.0015914703253656626
0.0011888183653354645
0.0011378041235730052
0.000954880437348038
0.0007914779707789421
0.000882966909557581
0.0007371000247076154
0.0007495403406210244
0.0006300751701928675

Could you post the PyTorch, torchvision, CUDA, and cudnn versions you are using and how you’ve installed PyTorch?

Thanks a lot @ptrblck for looking into it. Below are the details

PyTorch: 1.6.0
torchvision: 0.7.0
cuda : 10.0.130
cudnn: 7.5
GPU: Titan RTX

Installed pytorch using conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

It works fine with CUDA 10.2.

I will upgrade my env.

Thanks