DDP no support for sparse tensor

rira.18ki · October 20, 2023, 11:45pm

I would like to use PyTorch DDP to speed up additional training of openAI’s whisper.
However, when wrapping the whisper model into DDP, I get a “RuntimeError: No support for sparse tensors” error and cannot proceed.
Can you please tell me how to solve this problem?

I believe this error occurs because whisper is a sparse transformer.
Does not PyTorch DDP support this transformer?

H-Huang · October 23, 2023, 2:03pm

I’m not familiar with whisper, which part of the model is using sparse tensors? DDP will allreduce the gradients of each rank after iteration and so if part of the model is sparse then the gradients will also be sparse and they cannot be allreduced.

I am assuming you are using nccl, nccl does not support sparse collectives. You can try using the gloo backend which supports sparse allreduce but be aware it may not be as performant as nccl.

rira.18ki · October 23, 2023, 10:36pm

Thanks for your advice.
I also looked for sparse tensors in the whisper model, but I was able to find.
So, I try to do “gloo”.

hotbaby · February 27, 2024, 2:54am

class Whisper(nn.Module):
    def __init__(self, dims: ModelDimensions):
        super().__init__()
        ...
        self.register_buffer("alignment_heads", all_heads.to_sparse(), persistent=False)

alignment_heads buffer is sparse tensor.

Modify alignment_heads to dense tensor may solve this problem.

alignment_heads_dense = model.get_buffer("alignment_heads").to_dense()
model.register_buffer("alignment_heads", alignment_heads_dense, persistent=False)

hotbaby · February 27, 2024, 2:56am

alignment_heads buffer is sparse tensor.

xunji_yao · December 3, 2024, 2:30am

I met the same issue when my model contained whisper, I solved it following your solutions.
Thanks