Really confuse on the same input but different outout

Really confuse !!! I have two linear layer, one is I initial by myself and another is get from the model. But they are just really different. And you can see their difference from the image below.
Given the Batchsize 1 and Batchsize 2 and they give me differen results, even the dtype are the same!

1 Like

Check the Reproducibility docs to learn more about determinism and how to make sure deterministic algorithms are selected.