Model Parallelism for HuggingFace Transformers

Hi,

I’m implementing Microsoft’s DeBERTa from HuggingFace in PyTorch.
Can I implement PyTorch’s Model Parallelism with this HuggingFace transformer?
If yes, is there any documentation regarding this ?