RuntimeError: INTERNAL ASSERT FAILED

I use torch=2.1.2, torchvision=0.16.2, torchaudio=2.0.2, cuda=12.1.
When I used ‘deepspped’ to pretrain LLaVA with 6 RTX4090, the error was shown as image


What can I do to solve it?

Do you encounter the same error in the latest stable or nightly release?