issue #108627
Describe the bug
I train and inference a classifier using autocast. Result is different accross diffenent GPUs (same .venv, code and data).
The result on A100 is much superior than on RTX A6000.
Not using autocast with ctx = nullcontext()
on RTX A6000, gets the similar result to A100 with autocast.
I get no torch warnings on either machine.
ctx = torch.amp.autocast(device_type='cuda', dtype=torch.bfloat16)
#training
with ctx:
logits, loss = classifier(X, Y)
#inference
with ctx:
logits, loss = classifier(X, None)
P.S. This has cost me one month of my business time.