Sparse tensor and distributed functions (all_reduce all_gather)

Hello,

Is there any plan for the support of Sparse tensors in distributed mode ?

it would really be helpful for the muti gpu transformer implementation.

cheers.
Vincent