Hello, when doing AMP with bfloat16 in cuda what are the operations autocasted?
In the documentation I only see bfloat16 for CPU and float16 for GPU.
https://pytorch.org/docs/stable/amp.html
I just want to know which operation run in bfloat16 and which in float32. I have a model which has conv2d layers, ReLU activation functions and a tanh activation function.
Thank you.
1 Like
Yes, the CUDA backend supports bfloat16
for Ampere or newer GPUs.
1 Like
Hello, thank you for your answer.
Sorry, I think I didn’t answer formulate my original question correctly.
I know that the CUDA backend supports bfloat16 in, for instance, the A100 GPU.
However, I want to know which operations AMP supports in bfloat16 and CUDA.
https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16
Are those the same ones as for float16?
Final question: how has it been decided that some operations can be cast to either float16 or bfloat16?
Yes, this should be the case as the lower precision ops are defined here which should use both dtypes, if I’m not mistaken.