i’m trying to use the automatic mixed precision training to speed update the training speed.But it seems i get the opposite result. The pytorch is 1.6, cuda is 10.1, gpu is Tesla k80 and Tesla T4,cudnn 8.0.2

the baseline version code is

```
import torch
N, D_in, D_out = 64, 1024, 512
x = torch.randn(N, D_in, device="cuda")
y = torch.randn(N, D_out, device="cuda")
model = torch.nn.Linear(D_in, D_out).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for to in range(500):
y_pred = model(x)
loss = torch.nn.functional.mse_loss(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

time is

real 0m3.402s

user 0m2.926s

sys 0m4.955s

the amp version code is

```
import torch
from torch.cuda.amp import autocast, GradScaler
N, D_in, D_out = 64, 1024, 512
x = torch.randn(N, D_in, device="cuda")
y = torch.randn(N, D_out, device="cuda")
scaler = GradScaler()
model = torch.nn.Linear(D_in, D_out).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for to in range(500):
with autocast():
y_pred = model(x)
loss = torch.nn.functional.mse_loss(y_pred, y)
scaler.scale(loss).backward()
optimizer.zero_grad()
scaler.step(optimizer)
scaler.update()
```

the time is

real 0m3.584s

user 0m3.131s

sys 0m4.832s