Hi, I am applying the tensor parallel to a submodule
self.ffn = nn.Sequential(
nn.Linear(dim, ffn_dim),
nn.GELU(approximate="tanh"),
nn.Linear(ffn_dim, dim),
)
the parallel plan is
"ffn": PrepareModuleInput(
input_layouts=(Replicate(),),
desired_input_layouts=(Replicate(),),
),
"ffn.0": ColwiseParallel(),
"ffn.2": RowwiseParallel(
output_layouts=Replicate(),
use_local_output=True,
),
But after the parallel, the results are not numeric matched. e.g., compute the output norm is 73.18 vs. 73.15. Is this expected or is there something wrong? Thanks.