I am building a simple many-to-many character level LSTM and I came across a very peculiar behavior, hoping that someone could shed some light on it. Here is the code for the LSTM class:
class CharLSTM(nn.Module):
def __init__(self, vocab_size, batch_size, sequence_length, hidden_dim, n_layers, drop_p):
super(CharLSTM, self).__init__()
# init
self.hidden_dim = hidden_dim
self.vocab_size = vocab_size
self.batch_size = batch_size
self.sequence_length = sequence_length
self.dropout = nn.Dropout(p=drop_p)
# define the lstm
self.lstm = nn.LSTM(input_size=vocab_size,
hidden_size=hidden_dim,
num_layers=n_layers,
batch_first=True)
# define the fully connected layer
self.fc = nn.Linear(in_features=hidden_dim, out_features=vocab_size)
def forward(self, x):
# output of the lstm
x, hidden = self.lstm(x)
# add dropout on the lstm output
x = self.dropout(x)
# flat the output from the lstm
x = x.view(self.sequence_length * self.batch_size, self.hidden_dim)
# calculate the softmax scores
scores = self.fc(x)
return scores
Here is the thing: if I remove the x = self.dropout(x)
on the LSTM’s output, pytorch raises an exception:
RuntimeError: invalid argument 2: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at c:\programdata\miniconda3\conda-bld\pytorch-cpu_1524541161962\work\aten\src\th\generic/THTensor.cpp:280
With the dropout in place everything works just fine. If someone could advise I would highly appreciate it.