nn.Linear and nn.Embedding via RowwiseParallel and ColumnwiseParallel
nn.LayerNorm and nn.Dropout via SequenceParallel
I need to implement tensor parallelization for custom modules and other standard pytorch modules such as nn.Conv1d. However, this currently raises NotImplementedError.
Are there any examples or guidelines for how to implement ColumnwiseParallel and RowwiseParallel for other modules such as nn.Conv1d?
Good question. Do you mind sharing the full error log of NotImplementedError and the code to reproduce? This can be a feature request in Sign in to GitHub · GitHub
TP/SP are implemented for matmul / nn.Linear right now. The behavior for nn.Conv1d is not well defined
The implementation of tensor parallel covers really just a tiny fraction of common modules, which makes tensor parallel currently not applicable to most models other than llama. An alternative is to use the Megatron library, but I would prefer to directly use Pytorch distributed.