I’m wondering if the following is expected behavior:
import torch
torch.set_grad_enabled(False)
model = torch.nn.Linear(5, 1, bias=False).cuda()
model.weight.data[:] = 1
inputs = torch.ones(2, 5).cuda()
with torch.autocast(device_type='cuda'):
out = model(inputs)
print(out.sum().cpu()) # prints tensor(10.)
model.weight.data[:] = 0
out = model(inputs)
print(out.sum().cpu()) # prints tensor(10.), but this should be tensor(0.)
It seems that changes to weights which occur in the autocast context aren’t reflected in outputs.
I ran into this issue because I had code like this
with torch.no_grad(), autocast():
results = []
for ckpt in ckpts_list:
model.load_state_dict(ckpt)
results.append(model(images))
which was giving nonsense results – I believe because my model’s BatchNorm weights were being updated with the new checkpoints in the autocast cache, but the conv/linear layers were not.