I am observing different results when using `nn.Linear` on different GPU configurations with the same input data

Environment

  • PyTorch Version: 1.8.1+cu111
  • GPU: A30 mig12g and mig6g
  • Python Version: 3.8.8

Steps to Reproduce

  1. Initialize a model with nn.Linear layer.
  2. Perform a forward pass with the same input data on different GPU configurations.
  3. Observe different output features.
    image

what is the reason?

Hi Wanxin!

You can’t expect exact equality across different architectures (or versions,
gpu vs. cpu, batch sizes, etc.) Your results differ by amounts consistent
with floating-point round-off error, as is to be expected.

Best.

K. Frank

2 Likes

Thank you for your response.
You are correct, but I still have a question. When I only use convolutional layers, this situation does not occur. Is it because the underlying implementation mechanisms of linear layers and convolutional layers are different?

Hi Wanxin!

Well, yes, strictly speaking, linear and convolutional layers are different.
But I don’t think that’s really the point.

I fully believe that on some sets of architectures, you will get round-off-error
discrepancies with convolutional layers, just as you saw with linear layers.
I would say that it happened to be the case that for your particular
convolutional layers and your particular gpu configurations, pytorch chose
to use the same orderings of floating-point operations and therefore no
discrepancy arose.

That is, what you saw was happenstance, rather than convolutional layers
having some special behavior.

Best.

K. Frank