Bus error finetuning whisper model in multi GPU instances

hitesh_agarwal · December 15, 2023, 2:34pm

Hi, I am trying to finetune Whisper according to the blog post here. The finetuning works great in a single GPU scenario, however, fails with multi GPU instances. While executing trainer.train(), multi GPU instances return Bus error (core dumped).

I am working on g5.12xlarge instance for multi GPU on AWS with AMI ID: ami-071323fe2bf59945b on Ubuntu. I would appreciate any guidance or suggestions to resolve this issue.|

fduwjj · December 26, 2023, 6:06pm

Do you have a code snippet to share?