Autocast Slows Down the Code

Hi,
I have RTX 3070. Somehow using autocast slows down my code.

torch.version.cuda prints 11.1, torch.backends.cudnn.version() prints 8005 and my PyTorch version is 1.9.0. I’m using Ubuntu 20.04 with Kernel 5.11.0-25-generic.

That’s the code I’ve been using:

torch.cuda.synchronize()
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
for epoch in range(10):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):

        inputs, labels = data

      
        optimizer.zero_grad()
        
        
        with torch.cuda.amp.autocast(): 
             outputs = net(inputs)
             oss = criterion(outputs, labels) 

         scaler.scale(loss).backward() 
         scaler.step(optimizer)
         scaler.update() 

end.record()

torch.cuda.synchronize()

print(start.elapsed_time(end))

I don’t know which model you are using, but you could try to either use the nightly binaries with cuDNN8.2.2 or build PyTorch from source with the latest cudnn version to check the latest performance.
Also, torch.backends.cudnn.benchmark = True could help in case you are working with static input shapes (or a low variance in the shape).

Well I solved it by increasing the out channels of the conv2d. And it worked that way. Thanks for help.