Expert Parallelism and Expert Parallelism + Tensor Parallelism need

Hi, I have no idea how to implement ep and ep+tp with current torch api. Is there any examples or tutorials? Is there anyone has tried it?

If you mean MoE, yes it is possible to implement MoE with TP via DTensor. We are also planning to explore this parallelism combination but it is not ready yet.

1 Like

Hi! If Iā€™m not mistaken expert parallelism is implemented in GitHub - databricks/megablocks

1 Like

Thanks. I will check it.

Yes, it is what i mean. Pytorch has great api to support TP. But i cannot find any tutorials helpful to implement EP or EP+TP. So i am confused.