Hi, I have no idea how to implement ep and ep+tp with current torch api. Is there any examples or tutorials? Is there anyone has tried it?
If you mean MoE, yes it is possible to implement MoE with TP via DTensor. We are also planning to explore this parallelism combination but it is not ready yet.
1 Like
Hi! If Iām not mistaken expert parallelism is implemented in GitHub - databricks/megablocks
1 Like
Thanks. I will check it.
Yes, it is what i mean. Pytorch has great api to support TP. But i cannot find any tutorials helpful to implement EP or EP+TP. So i am confused.