Same Input to Same Trained Model Producing Different Results Each Time

skoaster10s · August 25, 2017, 12:13am

So I’m writing a Seq2Seq model using this defined model:

class RNN(nn.Module):
	def __init__(self, input_size, hidden_size, n_layers=1):
		super(RNN, self).__init__()
		self.n_layers = n_layers # Number of hidden layers in the LSTM
		self.hidden_size = hidden_size # Dimension of a hidden vector

		self.embedding = nn.Embedding(input_size, hidden_size) # To create an embedding for each word
		self.gru = nn.GRU(hidden_size, hidden_size) # The hidden layer

	def forward(self, input, hidden):
		# Input is a LongTensor of the corresponding to a word in the input sequence
		embedded = self.embedding(input).view(1, 1, -1) # Reshape 1 x 1 x hidden_size tensor
		output = embedded
		for i in range(self.n_layers):
			output, hidden = self.gru(output, hidden)
		return output, hidden

	def initHidden(self):
		result = autograd.Variable(torch.zeros(1, 1, self.hidden_size))
		if use_cuda:
			return result.cuda()
		else:
			return result

I’ve trained the model and saved it with pickle. Then when I load the pickle file and run this trained model on a set of data, I get different results everytime. I don’t really see any places for randomness, unless I’m missing something?

ezyang · August 25, 2017, 1:56am

In general, even if your model has randomness, you can achieve deterministic execution by explicitly setting seeds.

seed = 0
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

But in your particular case, the hidden parameters in GRU are initialized randomly. See http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#GRU

skoaster10s · August 25, 2017, 7:47pm

I mean after I initialized the model and then trained it, then I save this trained model with pickle. Then I run this model on the same set of points and get different results each time

ezyang · August 27, 2017, 2:48pm

Ah, I’m sorry, I misread your original message. That is definitely odd; you should get the same results. Did you try setting the random seed explicitly, and did that help?

JaeJin_Cho · January 26, 2018, 12:58pm

Did you know the reason for it? I also have a same problem.

bottanski · February 5, 2018, 2:37pm

Hey @skoaster10s

Another random component to consider is dropout. If you run the model in train mode, your dropout will still be active. For inference, you should try setting your model to model.eval()

Let everyone know if that was the issue!

ankur · June 28, 2018, 12:53am

Thanks, setting the seed worked.

Sam131112 · August 22, 2019, 6:07am

running model.eval() solved the issue for me

fotinidelig · September 10, 2021, 3:11pm

Dear all,

I have stumbled upon the same error as stated in the description, although I am using model.eval() before predicting labels with my model.
Specifically, I have my trained NN model, in which I use one dropout layer. Then I’m using this network to do optimization over some other parameter (that is not connected to the model) and every time (or almost) I do a forward pass, the prediction changes. I’m adding a snippet of the code where I do my optimization, of the predict() function and of my forward() function. The thing is that when I remove the dropout layer the predictions don’t change…which should be the case with model.eval()… Also, I can’t use torch.no_grad() since I need the gradients for my other parameter. Thank you in advance!

w = input.clone().detach().requires_grad_(True).to(device)
params = [{'params': w}]
optimizer = torch.optim.Adam(params, lr=lr)

def opt(net, w):
        adv_sample = TO_MUL*torch.tanh(w)+TO_ADD
        _, logits = net.predict(adv_sample, logits=True)
        loss, fx = self.loss(const, adv_sample, input, logits[0], target)
        loss.backward()
        optimizer.step()

def predict(samples, logits=False):
        self.eval()
        logs = self.forward(samples)
        if logits:
            return torch.argmax(logs, 1), logs
        softmax = torch.nn.Softmax(dim=1)
        probs = softmax(logs)
        return torch.argmax(probs, 1), probs

def forward(x):
        x = self.conv22(x)
        x = self.mp(x)
        N, C, W, H = (*(x.shape),)
        x = torch.reshape(x, (N, C*W*H))
        x = self.fc1(x)
        x = F.dropout(x, p = .5)
        x = self.fc2(x)
        logits = self.fc3(x) # logits layer
        return logits

gphilip · September 10, 2021, 3:46pm

Your forward() function does not return anything, whereas you invoke it as logs = self.forward(samples). Could you say how/why this works?

Also, the indentation is not right, but I guess that is unintentional?

fotinidelig · September 10, 2021, 4:33pm

Yep sorry I was a bit careless about the indentation. forward() does return the logits but I didn’t paste all of it for brevity. Will edit it so that it’s more clear!

gphilip · September 10, 2021, 4:44pm

Thank you for fixing the code. What exactly do you observe, that does not match what you expect? Is it that the line logs = self.forward(samples) returns different values for logs for the same value of samples?

Or is it something else?

fotinidelig · September 10, 2021, 4:53pm

Yes exactly, e.g. that’s what I am running and gives different results:

pred_o, prob_o = net.predict(adv_sample)
pred_i, prob_i = net.predict(adv_sample)
pred_a, prob_a = net.predict(adv_sample)
print("%d"%pred_o)
print("%d"%pred_i)
print("%d"%pred_a)

I.e. the predictions differ for the same input, which means that dropout is still used? The only thing that I’m doing with the model “net” is that I’m passing it to a class (which uses the opt() function) to use it, but that shouldn’t change its parameters or the way it works. Could the fact that I’m calling optimizer.step() on the w parameter affect the model (since it has parameters and gradients)? But that sounds a bit unreasonable

gphilip · September 10, 2021, 4:59pm

That may be it. Did you try commenting out optimizer.step() to see if the predictions still differ?

fotinidelig · September 10, 2021, 6:50pm

Hm it didn’t work but I think that’s expected, since this optimizer has different params (namely just the w), not the ones from the model (and I also need the optimizer for w). It’s a very strange situation but if you know of a case where eval() fails let me know

gphilip · September 11, 2021, 12:52am

You could try printing a weight (or a small set of weights) from the network and see if the weight changes between the two runs. If you do find that the weight changes, then you can investigate why it changes. You may have to try many weights before you find one that changes.

Alternatively, you could try printing activations to trace the part of the network that changes between the two runs.

fotinidelig · September 11, 2021, 8:43am

Thanks @gphilip!
So after playing with the activations I found out that the problem was that I was using F.dropout() instead of nn.Dropout(), meaning that model.eval() had no effect to the dropout function. Very important and hope everyone avoids this (highly unpleasant) mistake!

gphilip · September 11, 2021, 11:46am

Thank you for telling us what you learnt!

Isaac_Sim · November 16, 2021, 12:19am

I am stuck with loaded model not working properly and it seems like yours is the answer. Can you explain a bit more about it?
I have exact same weights of the model before saved and after loaded but they acted differently

fotinidelig · November 16, 2021, 7:16am

Hi,
What exactly happens differently? And at test (inference) time or train time?
Could you add a code snippet from where you save and load the model?
In my case I was using some of my model layers wrong, namely dropouts which should have no effect at inference (aka after calling model.eval()). But because I was using torch.nn.functional.dropout() and not torch.nn.Dropout, model.eval() didn’t “disable” the dropout and I got unexpected output for the same input - it works only with the later modules.
According to my bug, I would check the following:

Make sure to use model.train() and model.eval() where needed (you can check here),
Use the torch.nn instead of torch.functional modules for layers that have trainable weights (and dropout/batchnorm layers)