Hi, I am trying to run the BERT pretraining with amp and bfloat16. First of all, if I specify
with torch.cuda.amp.autocast(dtype=torch.bfloat16):
the output tensor is shown as float16 not bfloat16. When I change the torch.cuda.amp.autocast to torch.autocast(“cuda”, dtype=torch.bfloat16), the output tensor shows bfloat16 datatype.
However, that does not eventually work either. The following error msg shows up:
RuntimeError: expected scalar type BFloat16 but found Half
I was wondering whether the pytorch native amp still lacks support for bfloat16 mixed precision training.
I am trying to run BERT pretraining using the DeepLearningExamples repository by nvidia. Any help or pointers towards the solution would be very much appreciated.