Reading this topic, I see that I face the same problem. I do have a dropout layer, so my issue is most likely related to the topic I posted. How can I solve it thought? In the topic, a solution is highlighted, but how to implement it is not mentioned in any of the posts.
From the linked post:
Did you disable any randomness via
model.eval()
?
so call model.eval()
to disable dropout layers.
I am doing this, but it’s not working. Here’s the code I have for the network:
class RNNModel(nn.Module):
def __init__(self, vocab_size, embedding_size, hidden_dim, n_layers, drop_rate=0.2):
super(RNNModel, self).__init__()
# Defining some parameters
self.hidden_dim = hidden_dim
self.embedding_size = embedding_size
self.n_layers = n_layers
self.vocab_size = vocab_size
self.drop_rate = drop_rate
self.char2int = None
self.int2char = None
#Defining the layers
# Define the encoder as an Embedding layer
#self.encoder = nn.Embedding(vocab_size, embedding_size)
# Dropout layer
self.dropout = nn.Dropout(drop_rate)
# RNN Layer
self.rnn = nn.LSTM(embedding_size, hidden_dim, n_layers, dropout=drop_rate, batch_first = True)
# Fully connected layer
self.decoder = nn.Linear(hidden_dim, vocab_size)
def forward(self, x, state):
# input shape: [batch_size, seq_len, embedding_size]
# Apply the embedding layer and dropout
#embed_seq = self.dropout(self.encoder(x))
#print('Input RNN shape: ', embed_seq.shape)
# shape: [batch_size, seq_len, embedding_size]
rnn_out, state = self.rnn(x, state)
#print('Out RNN shape: ', rnn_out.shape)
# rnn_out shape: [batch_size, seq_len, rnn_size]
# hidden shape: [2, num_layers, batch_size, rnn_size]
rnn_out = self.dropout(rnn_out)
# shape: [seq_len, batch_size, rnn_size]
# Stack up LSTM outputs using view
# you may need to use contiguous to reshape the output
rnn_out = rnn_out.contiguous().view(-1, self.hidden_dim)
logits = self.decoder(rnn_out)
# output shape: [seq_len * batch_size, vocab_size]
#print('Output model shape: ', logits.shape)
return logits, state
def init_state(self, device, batch_size=1):
"""
initialises rnn states.
"""
#return (Variable(torch.zeros(self.n_layers, batch_size, self.hidden_dim)),
# Variable(torch.zeros(self.n_layers, batch_size, self.hidden_dim)))
return (torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device),
torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device))
def predict(self, input):
# input shape: [seq_len, batch_size]
logits, hidden = self.forward(input)
# logits shape: [seq_len * batch_size, vocab_size]
# hidden shape: [2, num_layers, batch_size, rnn_size]
probs = F.softmax(logits)
# shape: [seq_len * batch_size, vocab_size]
probs = probs.view(input.size(0), input.size(1), probs.size(1))
# output shape: [seq_len, batch_size, vocab_size]
return probs, hidden
and I’m creating a model with this line
model = RNNModel(dict_size,embedding_size, hidden_dim, n_layers, drop_rate=0.0)
model.eval()
I have copied the model code from some project, but can’t remember which one it was and can’t find it.
Apart from a dropout kwarg in the LSTM class, this network also uses a Dropout layer before that. Could that be the issue? Reading its documentation, I see that this layer randomly zeros outputs during training. What happens when not in training mode? How do I go about that?