Is autocast expected to reflect changes to weights?

KellerJordan · September 20, 2023, 12:36pm

I’m wondering if the following is expected behavior:

import torch
torch.set_grad_enabled(False)

model = torch.nn.Linear(5, 1, bias=False).cuda()
model.weight.data[:] = 1

inputs = torch.ones(2, 5).cuda()

with torch.autocast(device_type='cuda'):
    out = model(inputs)
    print(out.sum().cpu()) # prints tensor(10.)
    model.weight.data[:] = 0
    out = model(inputs)
    print(out.sum().cpu()) # prints tensor(10.), but this should be tensor(0.)

It seems that changes to weights which occur in the autocast context aren’t reflected in outputs.

I ran into this issue because I had code like this

with torch.no_grad(), autocast():
    results = []
    for ckpt in ckpts_list:
        model.load_state_dict(ckpt)
        results.append(model(images))

which was giving nonsense results – I believe because my model’s BatchNorm weights were being updated with the new checkpoints in the autocast cache, but the conv/linear layers were not.

ptrblck · September 20, 2023, 3:23pm

That’s expected since autocast caches the parameters to avoid casting them repeatedly inside its context. If you want to manipulate internal data you might want to disable caching via cache_enabled=False.