Can `autocast` handle networks with layers having different dtypes?

Hi,

torch version = 2.5.0

I am wondering whether torch.autocast can handle neural networks with layers having different dtypes.

The following code suggests it cannot:

import torch
net = torch.nn.Sequential(
    torch.nn.Linear(2, 10, dtype=torch.float16),
    torch.nn.ReLU(),
    torch.nn.Linear(10, 10, dtype=torch.float32),
    torch.nn.ReLU(),
)
with torch.autocast("cuda"):
    net(torch.as_tensor([[1., 2.]], dtype=torch.float16))

=> it raises RuntimeError: mat1 and mat2 must have the same dtype, but got Half and Float

But at some point recently I had started to believe autocast was able to handle such cases.
I loaded a pretrained model on HuggingFace and changed just the dtype of one layer. Without autocast, the same RuntimeError was raised, but with autocast the error disappeared.
Here is a Minimum Reproducible Example:

from transformers import pipeline
pipe = pipeline(
    "text-classification", 
    model="Qwen/Qwen2.5-0.5B",
    torch_dtype=torch.float16,
    device_map="cuda"
)
pipe_model_named_parameters = {k: v for k, v in pipe.model.named_parameters()}
for p in pipe_model_named_parameters:
    if "score" in p:  # convert last trainable layer to float32 for stability during training
        pipe_model_named_parameters[p].data = pipe_model_named_parameters[p].data.to(dtype=torch.float32)
with torch.autocast("cuda"):  # context manager needed in case not all layers have the same dtype
    print(pipe(["a", "b", "z"]))

Any insight on what autocast allows and does not allow ?

1 Like

Seems to work if you use cuda tensors with torch.autocast(“cuda”)

import torch
net = torch.nn.Sequential(
    torch.nn.Linear(2, 10, dtype=torch.float16),
    torch.nn.ReLU(),
    torch.nn.Linear(10, 10, dtype=torch.float32),
    torch.nn.ReLU(),
).to("cuda")
with torch.autocast("cuda"):
    net(torch.tensor([[1., 2.]], dtype=torch.float16, device="cuda"))
1 Like

I’m unsure that casting all layers manually autocast will then have any effect:

Autocast will respect types assigned manually:

Ops called with an explicit dtype=... argument are not eligible, and will produce output that respects the dtype argument.

The dtype argument here for nn.Linear should just be used to initialize the dtype of the weight
torch/nn/modules/linear.py

1 Like

Thanks for your reply @soulitzer
Indeed it seems the problem was caused by net not being on cuda device while cuda is specified in torch.autocast.
Both options below work fine:

  1. adding .to("cuda") to net and specifying device="cuda" as @soulitzer pointed out
  2. replacing torch.autocast("cuda") by torch.autocast("cpu")

In case 1, the returned tensor has type torch.float16 while in case 2 the returned tensor has type torch.bfloat16. Not sure why there is such a difference.