I train and inference a classifier using autocast. Result is different accross diffenent GPUs (same .venv, code and data).
The result on A100 is much superior than on RTX A6000.
Not using autocast with
ctx = nullcontext() on RTX A6000, gets the similar result to A100 with autocast.
I get no torch warnings on either machine.
ctx = torch.amp.autocast(device_type='cuda', dtype=torch.bfloat16) #training with ctx: logits, loss = classifier(X, Y) #inference with ctx: logits, loss = classifier(X, None)
P.S. This has cost me one month of my business time.