Different results with identical random seed and identical code on different GPUs

I found the point in my code that introduced the first differences to the data (and those differences got bigger during the following layers). It is this very short piece:

self.layers = nn.Sequential(OrderedDict([
    ('lin1',  nn.Linear(in_features=2*n_coords, out_features=hidden_size)),
    ('relu1', nn.ReLU()),
    ('lin3',  nn.Linear(in_features=hidden_size, out_features=self.outsize)),
]))

There was a tiny differnce after the 4th or 5th digit after the comma after lin3. The weights and inputs are all torch.float32 tensors. What I did is the following:

Replace

def forward(self, x):
    x = self.layers(x)
    return x

by

def forward(self, x):
    x = x.double()
    self.layers = self.layers.double()
    x = self.layers(x)
    return x.float()

Now I get the exact same result on both GPUs (but this one is different from the previous results I got on the different GPUs)

However, I don’t understand why this happens. When both the inputs and network weights are float32 tensors, no double precision should be used, right? So why does the result differ? Both GPUs are able to process single precision floats. Is there some part of these layers that internally “upgrades” to double precision if available?

Also the rest of the model that follows does also just use float32 tensors and I never convert anything to double. And the output of the following layers are identical, the only part that differs are these few layers that I posted

1 Like