[CPU] Train network using float 16?

I’m considering ways to improve speed and memory footprint of models and data.

Reading SO, this forum and docs, it’s still unclear to me whether it’s possible to use either float16 or bfloat16 and whether I can set this somewhere as a top level parameter.

What I have found is the setting:

dtype = torch.float16
torch.set_default_dtype(dtype)

but this returns errors due to un-supported ops. (i’m using it with a standard neural network.)

Could anyone offer some hints?

Running lscpu | grep dtype shows apparently support for f16 (not bf16.)

CPU workloads should support bfloat16 in autocast as described in the docs:

As shown in the CPU example section of torch.autocast, “automatic mixed precision training/inference” on CPU with datatype of torch.bfloat16 only uses torch.autocast.

I don’t know what the status of float16 support on CPU is and if it’s planned.

1 Like