CPU workloads should support bfloat16
in autocast
as described in the docs:
As shown in the CPU example section of
torch.autocast
, “automatic mixed precision training/inference” on CPU with datatype oftorch.bfloat16
only usestorch.autocast
.
I don’t know what the status of float16
support on CPU is and if it’s planned.