Hi!
I am trying to train an LSTM-based sentence classifier, and have been blocked by two thus far insurmountable and seemingly unrelated problems.
Problem 1:
The training loss initially decreases, but then gets stuck around the same value of 0.9, going slightly up, going slightly down, but not changing significantly.
Problem 2:
Almost every time I run the training routine, the script crashes during training. If I try training in a Jupyter Notebook (my preferred option), the notebook kernel crashes. If I try running the training in a standalone script, the script dies with a segfault (not even an exception I could try to debug). I have tried every solution I could find online, but nothing works.
Because of these two problems, my goal of moving a model from Keras to PyTorch has completely stalled.
I am not at liberty to share the data set, but the data is just a sequence of zero-padded characters mapped to their indexes in the character vocabulary. The output targets are one of three categories the sentence could belong to (as a 2d array of [{0, 1, 2}] to work with CrossEntropyLoss).
Here is some code which I hope is sufficient to convey what I am trying to do:
Model:
class CharLSTMClassifier(nn.Module):
def __init__(self, *, vocab_size, n_classes,
embedding_size=16, lstm_units=256,
n_recurrent_layers=2, recurrent_dropout=0.3):
super().__init__()
self.lstm_units = lstm_units
self.n_recurrent_layers = n_recurrent_layers
self.n_classes = n_classes
self.embeddings = nn.Embedding(vocab_size, embedding_size,
padding_idx=0)
self.lstm = nn.LSTM(embedding_size, self.lstm_units,
n_recurrent_layers, batch_first=False,
bidirectional=True, dropout=recurrent_dropout)
self.lstm2class = nn.Linear(self.lstm_units * 2, n_classes)
self.hidden_state = None
def forward(self, x, **kwargs):
batch_size = x.size()[0]
seq_len = x.size()[1]
if self.hidden_state is None:
self.hidden_state = self.init_hidden_state(batch_size,
use_cuda=x.is_cuda)
embedded = self.embeddings(x).view((seq_len, batch_size, -1))
lstm_out, new_hidden = self.lstm(embedded, self.hidden_state)
self.hidden_state = new_hidden
output_space = self.lstm2class(lstm_out[-1])
return output_space
Training routine:
model = CharLSTMClassifier(vocab_size=len(vocab),
n_classes=n_classes)
model.cuda()
batch_size = 64
epochs = 15
batches_per_epoch = round((len(x_train) * 0.7) // batch_size)
dataset = TensorDataset(torch.LongTensor(x_train),
torch.LongTensor(y_train))
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())
print_loss_every = 100
for epoch in range(epochs):
print(f'Epoch {epoch+1}')
losses = deque(maxlen=print_loss_every)
for n, (x_batch, y_batch) in enumerate(tqdm(data_loader)):
optimizer.zero_grad()
x_batch = autograd.Variable(x_batch, requires_grad=False).cuda()
y_batch = autograd.Variable(y_batch, requires_grad=False).cuda()
scores = model(x_batch)
loss = loss_function(scores, y_batch)
loss.backward()
optimizer.step()
# print(loss)
losses.append(loss.data)
model.hidden_state = None
if n % print_loss_every == 0:
print('Mean loss', np.mean(losses))
print('Epoch loss', np.mean(losses))
I am running Pytorch 0.3 with Cuda 8.0 and CuDNN 7 on an Ubuntu 16.04 machine inside an Anaconda environment.
I would greatly appreciate any help!