Slow LSTM training


To learn Pytorch, I’m trying to convert my latest project from DyNet to PyTorch.

The model is based on a stacked BiLSTM that produces representations for each time step, each such representation is fed into a one-hidden-layer MLP that produces softmax probabilities (same MLP for each time step). Then, I use negative log loss and sum all the losses using python regular sum function.

The relevant model code:

self.birnn = nn.LSTM(rnn_input_dim, rnn_output_dim, num_layers=2, bidirectional=True)
self.mlp_linear1 = nn.Linear(2*rnn_output_dim, mlp_hid_dim)
self.mlp_linear2 = nn.Linear(mlp_hid_dim, mlp_out_dim)

def predict_probs(self, seq):
      Propagate through the network and output the probabilties
      of the classes of each element 
    # Feed the input sequence into the BiRNN and get the representation of
    # its elements
    rnn_outputs = self.birnn_layer(seq)

    # Feed all the BiRNN outputs (y1..yn) into the MLP and get
    # a list of log softmaxes
    return [self.do_mlp(y) for y in rnn_outputs]

def birnn_layer(self, seq):
    ''' Feed the input sequence into the BiRNN and return the representation of
        all the elemetns in the sequence (y1,..,yn) 

    hidden = (Variable(torch.randn(4, 1, self.rnn_output_dim)), 
              Variable(torch.randn(4, 1, self.rnn_output_dim)))

    # Feed the sequence of vectors into the BiRNN
    out, hidden = self.birnn(seq, hidden)

    return out

def do_mlp(self, x):
    ''' Propagate the given vector through a one hidden layer MLP '''
    h = self.relu(self.mlp_linear1(x))
    y = self.mlp_linear2(h)

    return self.log_softmax(y)

The training relevant code:

            # get the losses of the frame-based classifications
            losses = []
            for i,probs in enumerate(probs_list):
                # get the current gold label
                label = self.get_label(i, left_index, right_index)
                pytorch_label = Variable(torch.LongTensor([label]))

                # get negative log loss
                losses.append(loss_function(probs, pytorch_label))

                # calculate precision
                vals =
                if np.argmax(vals) == label:
                    train_success += 1

                num_of_train_frames += 1

            # sum all losses
            total_loss = sum(losses)

            train_closs += float(

            # Back propagation

I tested the code and it seems fine - it produces the same results as the dynet model. However, it turns out that while training, both the forward and the backward steps are about ~50 times slower than the same steps in the DyNet model. I’m using the CPU (on the same machine for both) since this model is relatively small and it takes few minutes to train it with dynet.

Am I doing something wrong?
Maybe Pytorch RNN/LSTM is optimized only for GPU usage?

I will be happy for some help.

I don’t know if you are working on a toy-size problem but you may be hitting the following issue: