Different output with the same input when inference on different machines

li_bo · September 4, 2017, 6:43am

I need to reproduce the same output on different machines. However, it’s different. There is no dropout layer, and I set torch.backends.cudnn.enabled = False, but still different. Any idea how to solve this?

QuantScientist · September 4, 2017, 6:50am

Maybe you have different version of CUDA and /or PyTorch on the machines?
Can you print the versions?

import torch
import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
from subprocess import call
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print("OS: ", sys.platform)
print("Python: ", sys.version)
print("PyTorch: ", torch.__version__)
print("Numpy: ", np.__version__)

li_bo · September 4, 2017, 7:31am

1060

 ('__Python VERSION:', '2.7.6 (default, Oct 26 2016, 20:30:19) \n[GCC 4.8.4]')
 ('__pyTorch VERSION:', '0.2.0_1')
 __CUDA VERSION
 ('__CUDNN VERSION:', 6021)
 ('__Number CUDA Devices:', 1L) 
 ('OS: ', 'linux2')
 ('Python: ', '2.7.6 (default, Oct 26 2016, 20:30:19) \n[GCC 4.8.4]')
 ('PyTorch: ', '0.2.0_1')
 ('Numpy: ', '1.13.1')

GTX TITAN X

('__Python VERSION:', '2.7.5 (default, Sep 15 2016, 22:37:39) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]')
('__pyTorch VERSION:', '0.2.0_1')
__CUDA VERSION
('__CUDNN VERSION:', 6021)
('__Number CUDA Devices:', 1L)
('OS: ', 'linux2')
('Python: ', '2.7.5 (default, Sep 15 2016, 22:37:39) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]')
('PyTorch: ', '0.2.0_1')
('Numpy: ', '1.13.0')

TITAN XP

('__Python VERSION:', '2.7.5 (default, Nov  6 2016, 00:28:07) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]')
('__pyTorch VERSION:', '0.2.0_1')
__CUDA VERSION
('__CUDNN VERSION:', 6021)
('__Number CUDA Devices:', 1L)
('OS: ', 'linux2')
('Python: ', '2.7.5 (default, Nov  6 2016, 00:28:07) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]')
('PyTorch: ', '0.2.0_1')
('Numpy: ', '1.13.1')

K80

('__Python VERSION:', '2.7.5 (default, Sep 15 2016, 22:37:39) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]')
('__pyTorch VERSION:', '0.2.0_1')
__CUDA VERSION
('__CUDNN VERSION:', 6021)
('__Number CUDA Devices:', 1L)
('OS: ', 'linux2')
('Python: ', '2.7.5 (default, Sep 15 2016, 22:37:39) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]')
('PyTorch: ', '0.2.0_1')
('Numpy: ', '1.13.0')

it seems there’s no big difference, but the outputs are different, e.g. softmax output max difference is 0.05, that is not acceptable in my case.

allenye0119 · September 4, 2017, 8:28am

Have you tried setting random seed?

li_bo · September 4, 2017, 8:30am

It seems if they are all titan xp, the output is the same. And I haven’t tried any random seed, what random seed should I set? cuda.seed or anything else?

allenye0119 · September 4, 2017, 8:34am

Try setting theses three seeds

np.random.seed()
torch.manual_seed()
torch.cuda.manual_seed()

li_bo · September 4, 2017, 9:10am

I tried, still not the same…

li_bo · September 4, 2017, 9:12am

And I’ll try to locate which layer gives the different values.

QuantScientist · September 4, 2017, 9:22am

Can you share the code?

li_bo · September 4, 2017, 9:45am


torch.backends.cudnn.enabled = False
np.random.seed(41)
torch.manual_seed(41)
torch.cuda.manual_seed(41)

model = models.models['alex_22']()
model.load_model()
net = model.cuda()

def get_input(n, c, h, w):
    return torch.randn(n, c, h, w)

load = True
# load = False
save_pth = 'tensors.pth' #'no_cudnn_tensors.pth'

saves = {}

if not load:
    a = get_input(1, 3, 127, 127)
    b = get_input(1, 3, 255, 255)
    saves['a'] = a
    saves['b'] = b
else:
    saves = torch.load(save_pth)
    a = saves['a']
    b = saves['b']

def Var(x):
    return Variable(x.cuda())

output = net(Var(a), Var(b))[1].data

b1 = net.forward_one_branch(Var(a), net.conv_r1, net.conv_cls1)[1].data
b2 = net.forward_one_branch(Var(b), net.conv_r2, net.conv_cls2)[1].data

f1 = net.features(Var(a)).data
f2 = net.features(Var(b)).data

if not load:
    saves['o'] = output
    saves['b1'] = b1
    saves['b2'] = b2
    saves['f1'] = f1
    saves['f2'] = f2
    torch.save(saves, save_pth)
    print 'saving'
else:
    o2 = saves['o']
    ob1 = saves['b1']
    ob2 = saves['b2']
    of1 = saves['f1']
    of2 = saves['f2']
    print (o2 - output).abs().max()
    print (b1 - ob1).abs().max()
    print (b2 - ob2).abs().max()
    print (f1 - of1).abs().max()
    print (f2 - of2).abs().max()

use this test code, I run on titan x and 1060,
output is
0.0687821805477
0.0696254000068
0.0968679785728
0.415367662907
0.437078356743
The difference is relative huge.

model part is two inputs feed in x, y, (1,3,127, 127)->(256, 4, 4), (1, 3, 255, 255) -> (255, 20, 20), and correlation two output.

features is a modified alexnet

class AlexNet5(nn.Module):
    def __init__(self):
        super(AlexNet5, self).__init__()
        self.conv1 = nn.Conv2d(3, 96, kernel_size=11, stride=2)
        self.conv2 = nn.Conv2d(96, 256, kernel_size=5)
        self.conv3 = nn.Conv2d(256, 384, kernel_size=3)
        self.conv4 = nn.Conv2d(384, 384, kernel_size=3)
        self.bn1 = nn.BatchNorm2d(96)
        self.bn2 = nn.BatchNorm2d(256)
        self.bn3 = nn.BatchNorm2d(384)
        self.bn4 = nn.BatchNorm2d(384)
        self.conv5 = nn.Conv2d(384, 256, kernel_size=3)
        self.bn5 = nn.BatchNorm2d(256)

        self.feature_size = 256

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.bn1(self.conv1(x)), kernel_size=3, stride=2))
        x = F.relu(F.max_pool2d(self.bn2(self.conv2(x)), kernel_size=3, stride=2))
        x = F.relu(self.bn3(self.conv3(x)))
        x = F.relu(self.bn4(self.conv4(x)))
        x = self.bn5(self.conv5(x))
        return x

QuantScientist · September 4, 2017, 10:10am

And you are running both on the CPU?
I dont see any GPU related tensors, for instance:
X_tensor = Variable(torch.from_numpy(a).cuda())

li_bo · September 4, 2017, 10:12am

sorry I edit my last reply, there is part missing last time

QuantScientist · September 4, 2017, 10:31am

Cant seem to find anything strange, if you want, upload a self contained Jupyter notebook to git with the data and I can run it locally to compare the results.

li_bo · September 5, 2017, 2:53am

I don’t know why but right now, k80, titan and titan xp’s outputs only differ at most 1e-6, but 1060 got a huge difference at most 0.5. Seems that 1060 I installed http://download.pytorch.org/whl/cu80/torch-0.2.0.post3-cp27-cp27m-manylinux1_x86_64.whl, and others are http://download.pytorch.org/whl/cu80/torch-0.2.0.post3-cp27-cp27mu-manylinux1_x86_64.whl . Can it produce such a huge difference?

li_bo · September 6, 2017, 8:21am

Finally I locate the problem, I installed pytorch in 1060 using conda. After changing it to the system python, the output is the same.