DistributedDataParallel() for Transformer model

Done anyone know how to add DistributedDataParallel() for the transformer model?

Transformer model: https://pytorch.org/tutorials/beginner/transformer_tutorial.html