So I’m writing a Seq2Seq model using this defined model:
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, n_layers=1):
super(RNN, self).__init__()
self.n_layers = n_layers # Number of hidden layers in the LSTM
self.hidden_size = hidden_size # Dimension of a hidden vector
self.embedding = nn.Embedding(input_size, hidden_size) # To create an embedding for each word
self.gru = nn.GRU(hidden_size, hidden_size) # The hidden layer
def forward(self, input, hidden):
# Input is a LongTensor of the corresponding to a word in the input sequence
embedded = self.embedding(input).view(1, 1, -1) # Reshape 1 x 1 x hidden_size tensor
output = embedded
for i in range(self.n_layers):
output, hidden = self.gru(output, hidden)
return output, hidden
def initHidden(self):
result = autograd.Variable(torch.zeros(1, 1, self.hidden_size))
if use_cuda:
return result.cuda()
else:
return result
I’ve trained the model and saved it with pickle. Then when I load the pickle file and run this trained model on a set of data, I get different results everytime. I don’t really see any places for randomness, unless I’m missing something?
I mean after I initialized the model and then trained it, then I save this trained model with pickle. Then I run this model on the same set of points and get different results each time
Ah, I’m sorry, I misread your original message. That is definitely odd; you should get the same results. Did you try setting the random seed explicitly, and did that help?
Another random component to consider is dropout. If you run the model in train mode, your dropout will still be active. For inference, you should try setting your model to model.eval()
I have stumbled upon the same error as stated in the description, although I am using model.eval() before predicting labels with my model.
Specifically, I have my trained NN model, in which I use one dropout layer. Then I’m using this network to do optimization over some other parameter (that is not connected to the model) and every time (or almost) I do a forward pass, the prediction changes. I’m adding a snippet of the code where I do my optimization, of the predict() function and of my forward() function. The thing is that when I remove the dropout layer the predictions don’t change…which should be the case with model.eval()… Also, I can’t use torch.no_grad() since I need the gradients for my other parameter. Thank you in advance!
w = input.clone().detach().requires_grad_(True).to(device)
params = [{'params': w}]
optimizer = torch.optim.Adam(params, lr=lr)
def opt(net, w):
adv_sample = TO_MUL*torch.tanh(w)+TO_ADD
_, logits = net.predict(adv_sample, logits=True)
loss, fx = self.loss(const, adv_sample, input, logits[0], target)
loss.backward()
optimizer.step()
def predict(samples, logits=False):
self.eval()
logs = self.forward(samples)
if logits:
return torch.argmax(logs, 1), logs
softmax = torch.nn.Softmax(dim=1)
probs = softmax(logs)
return torch.argmax(probs, 1), probs
def forward(x):
x = self.conv22(x)
x = self.mp(x)
N, C, W, H = (*(x.shape),)
x = torch.reshape(x, (N, C*W*H))
x = self.fc1(x)
x = F.dropout(x, p = .5)
x = self.fc2(x)
logits = self.fc3(x) # logits layer
return logits
Yep sorry I was a bit careless about the indentation. forward() does return the logits but I didn’t paste all of it for brevity. Will edit it so that it’s more clear!
Thank you for fixing the code. What exactly do you observe, that does not match what you expect? Is it that the line logs = self.forward(samples) returns different values for logs for the same value of samples?
I.e. the predictions differ for the same input, which means that dropout is still used? The only thing that I’m doing with the model “net” is that I’m passing it to a class (which uses the opt() function) to use it, but that shouldn’t change its parameters or the way it works. Could the fact that I’m calling optimizer.step() on the w parameter affect the model (since it has parameters and gradients)? But that sounds a bit unreasonable
Hm it didn’t work but I think that’s expected, since this optimizer has different params (namely just the w), not the ones from the model (and I also need the optimizer for w). It’s a very strange situation but if you know of a case where eval() fails let me know
You could try printing a weight (or a small set of weights) from the network and see if the weight changes between the two runs. If you do find that the weight changes, then you can investigate why it changes. You may have to try many weights before you find one that changes.
Alternatively, you could try printing activations to trace the part of the network that changes between the two runs.
Thanks @gphilip!
So after playing with the activations I found out that the problem was that I was using F.dropout() instead of nn.Dropout(), meaning that model.eval() had no effect to the dropout function. Very important and hope everyone avoids this (highly unpleasant) mistake!
I am stuck with loaded model not working properly and it seems like yours is the answer. Can you explain a bit more about it?
I have exact same weights of the model before saved and after loaded but they acted differently
Hi,
What exactly happens differently? And at test (inference) time or train time?
Could you add a code snippet from where you save and load the model?
In my case I was using some of my model layers wrong, namely dropouts which should have no effect at inference (aka after calling model.eval()). But because I was using torch.nn.functional.dropout() and not torch.nn.Dropout, model.eval() didn’t “disable” the dropout and I got unexpected output for the same input - it works only with the later modules.
According to my bug, I would check the following:
Make sure to use model.train() and model.eval() where needed (you can check here),
Use the torch.nn instead of torch.functional modules for layers that have trainable weights (and dropout/batchnorm layers)