For the task i want to solve, I need to backdrop the average log probability. It looks like the following:
tensor(-24.9345, device='cuda:2', grad_fn=<SumBackward0>)
so it seems to have gradient computation enabled. However, when I do backwards, it gives me the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/zhan1130/anaconda3/envs/nano/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/zhan1130/anaconda3/envs/nano/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Unexpected floating ScalarType in at::autocast::prioritize
not sure what it means or how to solve it.
Edit: i am using code modified from nanoGPT. The default dtype is ‘bfloat16’, but somehow the output logits is float32 and hence the log probability is also float32. simply using log_ probability = log_probability.bfloat16() still gives the same error