Issue with backproping log probabilities

For the task i want to solve, I need to backdrop the average log probability. It looks like the following:

tensor(-24.9345, device='cuda:2', grad_fn=<SumBackward0>)

so it seems to have gradient computation enabled. However, when I do backwards, it gives me the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zhan1130/anaconda3/envs/nano/lib/python3.8/site-packages/torch/", line 488, in backward
  File "/home/zhan1130/anaconda3/envs/nano/lib/python3.8/site-packages/torch/autograd/", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Unexpected floating ScalarType in at::autocast::prioritize

not sure what it means or how to solve it.

Edit: i am using code modified from nanoGPT. The default dtype is ‘bfloat16’, but somehow the output logits is float32 and hence the log probability is also float32. simply using log_ probability = log_probability.bfloat16() still gives the same error

This looks like an autocast error, could you post a runnable script that reproduces the issue or try disabling autocast e.g., by forcing the null context here nanoGPT/ at a82b33b525ca9855d705656387698e13eb8e8d4b · karpathy/nanoGPT · GitHub ?

I guess it would be very hard, because I don’t have enough memory on my machine. After posting, I tried to change all dtype to float16 and the issue is gone, however I faced out of memory issue. I presume bfloat16 is to save memory, so if I disable autocast there shouldn’t be this issue, but i simply don’t have enough memory to run it.