I’m considering ways to improve speed and memory footprint of models and data.
Reading SO, this forum and docs, it’s still unclear to me whether it’s possible to use either float16 or bfloat16 and whether I can set this somewhere as a top level parameter.
What I have found is the setting:
dtype = torch.float16
torch.set_default_dtype(dtype)
but this returns errors due to un-supported ops. (i’m using it with a standard neural network.)
Could anyone offer some hints?
Running lscpu | grep dtype
shows apparently support for f16 (not bf16.)