Hello!
Due to the release of Llama3, I wanted to fine-tune it using the torchtune on rtx4090, but ran into 2 problems. Perhaps you can help solve them:
when starting training, torchtune gives an error:
RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.
In the same time: “torch.cuda.is_bf16_supported()” return True
if i specify dtype fp32, another error occurs:
torch._C._nn.cross_entropy_loss
RuntimeError: expected scalar type Long but found Int