Loss not decreasing + random crashes during training

Hi!

I am trying to train an LSTM-based sentence classifier, and have been blocked by two thus far insurmountable and seemingly unrelated problems.

Problem 1:

The training loss initially decreases, but then gets stuck around the same value of 0.9, going slightly up, going slightly down, but not changing significantly.

Problem 2:

Almost every time I run the training routine, the script crashes during training. If I try training in a Jupyter Notebook (my preferred option), the notebook kernel crashes. If I try running the training in a standalone script, the script dies with a segfault (not even an exception I could try to debug). I have tried every solution I could find online, but nothing works.

Because of these two problems, my goal of moving a model from Keras to PyTorch has completely stalled.

I am not at liberty to share the data set, but the data is just a sequence of zero-padded characters mapped to their indexes in the character vocabulary. The output targets are one of three categories the sentence could belong to (as a 2d array of [{0, 1, 2}] to work with CrossEntropyLoss).

Here is some code which I hope is sufficient to convey what I am trying to do:

Model:

class CharLSTMClassifier(nn.Module):
    def __init__(self, *, vocab_size, n_classes, 
                 embedding_size=16, lstm_units=256,
                 n_recurrent_layers=2, recurrent_dropout=0.3):
        super().__init__()
        self.lstm_units = lstm_units
        self.n_recurrent_layers = n_recurrent_layers
        self.n_classes = n_classes
        
        self.embeddings = nn.Embedding(vocab_size, embedding_size,
                                       padding_idx=0)
        self.lstm = nn.LSTM(embedding_size, self.lstm_units,
                            n_recurrent_layers, batch_first=False,
                            bidirectional=True, dropout=recurrent_dropout)
        self.lstm2class = nn.Linear(self.lstm_units * 2, n_classes)
        self.hidden_state = None

def forward(self, x, **kwargs):
        batch_size = x.size()[0]
        seq_len = x.size()[1]
        if self.hidden_state is None:
            self.hidden_state = self.init_hidden_state(batch_size,
                                                       use_cuda=x.is_cuda)
        
        embedded = self.embeddings(x).view((seq_len, batch_size, -1))
        lstm_out, new_hidden = self.lstm(embedded, self.hidden_state)
        self.hidden_state = new_hidden
        output_space = self.lstm2class(lstm_out[-1])
        return output_space

Training routine:

model = CharLSTMClassifier(vocab_size=len(vocab), 
                           n_classes=n_classes)
model.cuda()

batch_size = 64
epochs = 15

batches_per_epoch = round((len(x_train) * 0.7) // batch_size)

dataset = TensorDataset(torch.LongTensor(x_train),
                        torch.LongTensor(y_train))
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

print_loss_every = 100

for epoch in range(epochs):
    print(f'Epoch {epoch+1}')
    
    losses = deque(maxlen=print_loss_every)
    
    for n, (x_batch, y_batch) in enumerate(tqdm(data_loader)):
        optimizer.zero_grad()
        x_batch = autograd.Variable(x_batch, requires_grad=False).cuda()
        y_batch = autograd.Variable(y_batch, requires_grad=False).cuda()
        scores = model(x_batch)

        loss = loss_function(scores, y_batch)
        loss.backward()
        optimizer.step()
#         print(loss)
        losses.append(loss.data)
        model.hidden_state = None
        
        if n % print_loss_every == 0:
            print('Mean loss', np.mean(losses))
    
    print('Epoch loss', np.mean(losses))

I am running Pytorch 0.3 with Cuda 8.0 and CuDNN 7 on an Ubuntu 16.04 machine inside an Anaconda environment.

I would greatly appreciate any help!