Fine-tune LLMs using torchtune

Due to the release of Llama3, I wanted to fine-tune it using the torchtune on rtx4090, but ran into 2 problems. Perhaps you can help solve them:

  1. when starting training, torchtune gives an error:
    RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.
    In the same time: “torch.cuda.is_bf16_supported()” return True
  2. if i specify dtype fp32, another error occurs:
    RuntimeError: expected scalar type Long but found Int

Got same error. RTX 3090, torch.cuda.is_bf16_supported() gives True.

Are you using Windows by any chance? There’s a known issue there otherewise this is unexpected since 4090 does indeed support bf16 Runtime Error: BF16 unsupported on supported hardware · Issue #891 · pytorch/torchtune · GitHub

I corrected the code a little and launched the training on Windows. If this is still relevant to someone, I will describe my actions

I got same error on 4090.

It is very relevant for me :slight_smile:

@kostya.loparev can you share your solution?