DDP no support for sparse tensor

I would like to use PyTorch DDP to speed up additional training of openAI’s whisper.
However, when wrapping the whisper model into DDP, I get a “RuntimeError: No support for sparse tensors” error and cannot proceed.
Can you please tell me how to solve this problem?

I believe this error occurs because whisper is a sparse transformer.
Does not PyTorch DDP support this transformer?

I’m not familiar with whisper, which part of the model is using sparse tensors? DDP will allreduce the gradients of each rank after iteration and so if part of the model is sparse then the gradients will also be sparse and they cannot be allreduced.

I am assuming you are using nccl, nccl does not support sparse collectives. You can try using the gloo backend which supports sparse allreduce but be aware it may not be as performant as nccl.

Thanks for your advice.
I also looked for sparse tensors in the whisper model, but I was able to find.
So, I try to do “gloo”.

class Whisper(nn.Module):
    def __init__(self, dims: ModelDimensions):
        super().__init__()
        ...
        self.register_buffer("alignment_heads", all_heads.to_sparse(), persistent=False)

alignment_heads buffer is sparse tensor.

Modify alignment_heads to dense tensor may solve this problem.

alignment_heads_dense = model.get_buffer("alignment_heads").to_dense()
model.register_buffer("alignment_heads", alignment_heads_dense, persistent=False)

alignment_heads buffer is sparse tensor.