Multi-GPU using EmbeddingBag?

Hi, all,
I want to multi-GPU to train word embeddings. I am using EmbeddingBag class on single GPU, which works fine.
When I search for multi-gpu methods, I found this tutorial: https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

It says DataParallel will split the input along first dimension.
Does it mean we can not use EmbeddingBag for mutli-gpu? Because the input of EmbeddingBag is a concatenation of multiple bags and its offsets.
It would be incorrect to split them along first dimension.

Is there other ways that we can use multi-gpu for EmbeddingBag?