Backpropagation error with Batched LSTM

Dear all.

I got a problem training an RNN LSTM network with batching. I can’t figure why the network doesn’t backpropagate the error: the loss doesn’t go down. I hope some can give me some advice to fix this. I’m using python3 with pytorch 0.31.

here is the relevant code in my model class:

   def init_hidden(self, batch):
        return (autograd.Variable(torch.randn(NUM_LAYERS*NUM_DIRS, batch, self.hidden_dim // NUM_DIRS)),
                autograd.Variable(torch.randn(NUM_LAYERS*NUM_DIRS, batch, self.hidden_dim // NUM_DIRS)))

    def forward(self, sentence, lengths):
        self.hidden = self.init_hidden(sentence.size(-1))
        embeds = self.word_embeddings(sentence)  
        embeds = self.dropout(embeds) 
        packed_input = pack_padded_sequence(embeds, lengths)
        packed_output, (ht, ct) = self.lstm(packed_input, self.hidden)
        lstm_out, _ = pad_packed_sequence(packed_output)  
        output = self.hidden2tag(lstm_out)  
        output = self.softmax(output)  
        return output

And in the training loop:

    print('Train with', len(data), 'examples.')
    for epoch in range(EPOCHS):
        print(f'Starting epoch {epoch}.')
        loss_sum = 0
        y_true = list()
        y_pred = list()
        for batch, lengths, targets, lengths2 in tqdm(dataset):
            model.zero_grad()
            batch, targets, lengths = sort_batch(batch, targets, lengths)
            pred = model(autograd.Variable(batch), lengths.cpu().numpy())
            loss = loss_function(pred.view(-1, pred.size()[2]), autograd.Variable(targets).view(-1, 1).squeeze(1))
            loss.backward()
            optimizer.step()
            loss_sum += loss.data[0]
            print(loss.data[0])
            pred_idx = torch.max(pred, 1)[1]
            y_true += list(targets.int())
            y_pred += list(pred_idx.data.int())

        loss_total=loss_sum / len(dataset)
        print('>>> Loss:', loss_total)

Thank you for your time!

What do you mean by this? What error are you running into?

The loss doesn’t shrink. It seems as back-propagation never is applied to the the model.

Have you checked what the gradients of your parameters are?

here are the gradientes and the data of the net:

data is

 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
-0.3084 -1.3064 -1.2891  ...  -1.4110  0.6938  0.5450
 0.1591 -0.6072 -0.1963  ...  -0.3436  1.1036  1.7518
          ...             ⋱             ...          
 1.8543  1.0727 -0.8892  ...  -0.6763  0.8102 -0.3621
 1.2991  0.8048  0.5159  ...  -1.9514 -0.0883 -0.9512
 0.3984  0.5590 -1.4316  ...  -0.9093  0.1330 -2.4621
[torch.FloatTensor of size 13836x100]

grad is
Variable containing:
1.00000e-05 *
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
-0.4833  0.7489 -1.0286  ...   0.6889  0.1027  0.1220
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
          ...             ⋱             ...          
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
[torch.FloatTensor of size 13836x100]