@mrshenli Well the second problem is due to I’m using tf dataset eager mode to read data first then convert to torch tensors, the problem has been solved.
For the first I find yes it is due to using sparse not related to scatter_add. So the problem is the same as DistributedDataParallel Sparse Embeddings