I’m wondering if the following is expected behavior:
import torch torch.set_grad_enabled(False) model = torch.nn.Linear(5, 1, bias=False).cuda() model.weight.data[:] = 1 inputs = torch.ones(2, 5).cuda() with torch.autocast(device_type='cuda'): out = model(inputs) print(out.sum().cpu()) # prints tensor(10.) model.weight.data[:] = 0 out = model(inputs) print(out.sum().cpu()) # prints tensor(10.), but this should be tensor(0.)
It seems that changes to weights which occur in the autocast context aren’t reflected in outputs.
I ran into this issue because I had code like this
with torch.no_grad(), autocast(): results =  for ckpt in ckpts_list: model.load_state_dict(ckpt) results.append(model(images))
which was giving nonsense results – I believe because my model’s BatchNorm weights were being updated with the new checkpoints in the autocast cache, but the conv/linear layers were not.