Question about bfloat16 operations in AMP and cuda

Hello, when doing AMP with bfloat16 in cuda what are the operations autocasted?

In the documentation I only see bfloat16 for CPU and float16 for GPU.

https://pytorch.org/docs/stable/amp.html

I just want to know which operation run in bfloat16 and which in float32. I have a model which has conv2d layers, ReLU activation functions and a tanh activation function.

Thank you.

1 Like

Yes, the CUDA backend supports bfloat16 for Ampere or newer GPUs.

1 Like

Hello, thank you for your answer.

Sorry, I think I didn’t answer formulate my original question correctly.

I know that the CUDA backend supports bfloat16 in, for instance, the A100 GPU.

However, I want to know which operations AMP supports in bfloat16 and CUDA.

https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16

Are those the same ones as for float16?

Final question: how has it been decided that some operations can be cast to either float16 or bfloat16?

Yes, this should be the case as the lower precision ops are defined here which should use both dtypes, if I’m not mistaken.