Can not Create Reproducable NN

I have a simple toy NN with Pytorch. I am setting all the seeds I can find in the docs as well as numpy random.

If I run the code below from top to bottom, the results appear to be reproducible.

BUT, if I run Block 1 only once and then each time repeatedly run block 2, the result changes (sometimes dramatically). I am unsure why this happens since the network is being re-initialized and optimizer reset each time. Something with the inputs and target?

I am using version 0.4.0

BLOCK #1**************************************

from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import torch
import torch.utils.data as utils_data
from torch.autograd import Variable
from torch import optim, nn
from torch.utils.data import Dataset 
import torch.nn.functional as F
from torch.nn.init import xavier_uniform_, xavier_normal_,uniform_

torch.manual_seed(123)

import random
random.seed(123)


from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

%matplotlib inline 

cuda=True #set to true uses GPU

if cuda:
    torch.cuda.manual_seed(123)

#load boston data from scikit
boston = load_boston()
x=boston.data
y=boston.target
y=y.reshape(y.shape[0],1)

#train and test
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3, random_state=123, shuffle=False)


#change to tensors
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)

#create dataset and use data loader
training_samples = utils_data.TensorDataset(x_train, y_train)
data_loader_trn = utils_data.DataLoader(training_samples, batch_size=64,drop_last=False)

#change to tensors
x_test = torch.from_numpy(x_test)
y_test = torch.from_numpy(y_test)

#create dataset and use data loader
testing_samples = utils_data.TensorDataset(x_test, y_test)
data_loader_test = utils_data.DataLoader(testing_samples, batch_size=64,drop_last=False)

#simple model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        #all the layers
        self.fc1   = nn.Linear(x.shape[1], 20)
        xavier_uniform_(self.fc1.weight.data) #this is how you can change the weight init
        self.drop = nn.Dropout(p=0.5)
        self.fc2   = nn.Linear(20, 1)


    def forward(self, x):
        x = F.relu(self.fc1(x))
        x=  self.drop(x)
        x = self.fc2(x)
        return x

****BLOCK #2******************************************

net=Net()

if cuda:
    net.cuda()

# create a stochastic gradient descent optimizer
optimizer = optim.Adam(net.parameters())
# create a loss function (mse)
loss = nn.MSELoss(size_average=False)

# run the main training loop
epochs =20
hold_loss=[]

for epoch in range(epochs):
    cum_loss=0.
    cum_records_epoch =0

    for batch_idx, (data, target) in enumerate(data_loader_trn):
        tr_x, tr_y = data.float(), target.float()
        if cuda:
            tr_x, tr_y = tr_x.cuda(), tr_y.cuda() 

        # Reset gradient
        optimizer.zero_grad()

        # Forward pass
        fx = net(tr_x)
        output = loss(fx, tr_y) #loss for this batch

        cum_loss += output.item() #accumulate the loss

        # Backward 
        output.backward()

        # Update parameters based on backprop
        optimizer.step()

        cum_records_epoch +=len(tr_x)
        if batch_idx % 1 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
            epoch, cum_records_epoch, len(data_loader_trn.dataset),
            100. * (batch_idx+1) / len(data_loader_trn), output.item()))
    print('Epoch average loss: {:.6f}'.format(cum_loss/cum_records_epoch))

    hold_loss.append(cum_loss/cum_records_epoch)  

#training loss
plt.plot(np.array(hold_loss))
plt.show()

This is expected behavior.
If you seed the PRNG, the same “random” numbers will be samples, but you won’t get the exactly same number every time.
If you run block 2 a few times, you model will be reinitialized with different weights.
Some of them might work, some don’t.

Or am I misunderstanding your question?

Let me make sure I asked the question clearly - I did notice my code got truncated so I justed add to the end.

If I run that full code, I get an average loss on the last epoch of 387.385345 on my machine. If I rerun it all again I get the same answer. If I restart the kernel and re-run again I get the same answer. So, it seems to be reproducible which was what I was hoping to figure out how to do.

However if I restart the kernel and run the whole code again I get 387.385345 for the last epoch. BUT if I then run only block #2 without first running block #1, the result changes. I cant figure out what needs to rerun in block #1 to get the same result.

Is this still expected behavior - so we can no expect to reproduce a network?

As you explained before, you are indeed reproducing the exact network by running the script, which is really good!

However, your second use case is expected. Running only block 2 won’t give you the exact same results as running the whole script together, because you have already sampled some random numbers.
Since the seeded PRNG was already used, it will return new random numbers, which might change the behavior of block 2.

If you really want the same results for each block, you would have to re-seed all number generators in each block. Could you try that?

Of the seeds I was using:

torch.manual_seed(123)

import random
random.seed(123)

cuda=True #set to true uses GPU

if cuda:
    torch.cuda.manual_seed(123)

It appears that only this one is required each time: torch.manual_seed(123)
The cuda one is not (I am using a GPU).

If I do this each time, then things are the same! I was under the impression that setting a seed like this would mean that the random number was used each time - held in memory. That is not the case?

The pseudo random number generator uses a seed to generate a sequence of numbers whose properties approximate the properties of sequences of random numbers. Using a seed, the PRNG will always return the same numbers in the same orders.
But it will still generate different numbers, which appear to be random.

Have a look at the Wikipedia page for more information.

1 Like

So, just to complete my education here, is it accurate to say that torch.manual_seed(123) will produce a sequence of random numbers, always the same…but that each time it is called by the rest of the code, where at in the sequence returned will change? And running torch.manual_seed(123) again resets the pointer to the first slot?

Finally, is it odd that torch.cuda.manual_seed(123) doesnt seem to be needed even though I am training on a GPU?

Yes, you are right. Have a look at the example code:

torch.manual_seed(2809)
a = torch.empty(10).random_(100)
b = torch.empty(10).random_(100)
print(a)
print(b)
> tensor([ 37.,  81.,  75.,  39.,  77.,  30.,  23.,  28.,  96.,  66.])
> tensor([ 27.,  49.,  75.,  11.,  71.,  48.,   6.,  74.,  41.,  20.])

torch.manual_seed(2809)
a = torch.empty(10).random_(100)
print(a)
> tensor([ 37.,  81.,  75.,  39.,  77.,  30.,  23.,  28.,  96.,  66.])

As you can see, the same 10 “random” numbers were sampled after I reset the seed.
The same goes for randomly sampled values on the GPU:

torch.cuda.manual_seed(2809)
a_cuda = torch.empty(10, device='cuda').random_(100)
b_cuda = torch.empty(10, device='cuda').random_(100)
print(a_cuda)
print(b_cuda)
> tensor([ 85.,  23.,  41.,  10.,  21.,   6.,  84.,  88.,  13.,  17.], device='cuda:0')
> tensor([ 53.,  20.,   8.,  56.,  17.,  34.,  55.,  14.,  68.,  41.], device='cuda:0')

torch.cuda.manual_seed(2809)
a_cuda = torch.empty(10, device='cuda').random_(100)
print(a_cuda)
> tensor([ 85.,  23.,  41.,  10.,  21.,   6.,  84.,  88.,  13.,  17.], device='cuda:0')

If you don’t sample random numbers on the GPU, the manual seed won’t have any effect.
However, I would still recommend setting all the seeds in case you sample on the GPU.

1 Like