Question about bfloat16 operations in AMP and cuda

deepfailure · July 10, 2024, 3:20pm

Hello, when doing AMP with bfloat16 in cuda what are the operations autocasted?

In the documentation I only see bfloat16 for CPU and float16 for GPU.

https://pytorch.org/docs/stable/amp.html

I just want to know which operation run in bfloat16 and which in float32. I have a model which has conv2d layers, ReLU activation functions and a tanh activation function.

Thank you.

ptrblck · July 10, 2024, 4:29pm

Yes, the CUDA backend supports bfloat16 for Ampere or newer GPUs.

deepfailure · July 11, 2024, 8:02am

Hello, thank you for your answer.

Sorry, I think I didn’t answer formulate my original question correctly.

I know that the CUDA backend supports bfloat16 in, for instance, the A100 GPU.

However, I want to know which operations AMP supports in bfloat16 and CUDA.

https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16

Are those the same ones as for float16?

Final question: how has it been decided that some operations can be cast to either float16 or bfloat16?

ptrblck · July 11, 2024, 4:52pm

Yes, this should be the case as the lower precision ops are defined here which should use both dtypes, if I’m not mistaken.