Hi,

To learn Pytorch, I’m trying to convert my latest project from DyNet to PyTorch.

The model is based on a stacked BiLSTM that produces representations for each time step, each such representation is fed into a one-hidden-layer MLP that produces softmax probabilities (same MLP for each time step). Then, I use negative log loss and sum all the losses using python regular sum function.

The relevant model code:

```
self.birnn = nn.LSTM(rnn_input_dim, rnn_output_dim, num_layers=2, bidirectional=True)
self.mlp_linear1 = nn.Linear(2*rnn_output_dim, mlp_hid_dim)
self.mlp_linear2 = nn.Linear(mlp_hid_dim, mlp_out_dim)
def predict_probs(self, seq):
'''
Propagate through the network and output the probabilties
of the classes of each element
'''
# Feed the input sequence into the BiRNN and get the representation of
# its elements
rnn_outputs = self.birnn_layer(seq)
# Feed all the BiRNN outputs (y1..yn) into the MLP and get
# a list of log softmaxes
return [self.do_mlp(y) for y in rnn_outputs]
def birnn_layer(self, seq):
''' Feed the input sequence into the BiRNN and return the representation of
all the elemetns in the sequence (y1,..,yn)
hidden = (Variable(torch.randn(4, 1, self.rnn_output_dim)),
Variable(torch.randn(4, 1, self.rnn_output_dim)))
# Feed the sequence of vectors into the BiRNN
out, hidden = self.birnn(seq, hidden)
return out
def do_mlp(self, x):
''' Propagate the given vector through a one hidden layer MLP '''
h = self.relu(self.mlp_linear1(x))
y = self.mlp_linear2(h)
return self.log_softmax(y)
```

The training relevant code:

```
# get the losses of the frame-based classifications
losses = []
for i,probs in enumerate(probs_list):
# get the current gold label
label = self.get_label(i, left_index, right_index)
pytorch_label = Variable(torch.LongTensor([label]))
# get negative log loss
losses.append(loss_function(probs, pytorch_label))
# calculate precision
vals = probs.data.numpy()
if np.argmax(vals) == label:
train_success += 1
num_of_train_frames += 1
# sum all losses
total_loss = sum(losses)
train_closs += float(total_loss.data.numpy())
# Back propagation
total_loss.backward()
optimizer.step()
```

I tested the code and it seems fine - it produces the same results as the dynet model. However, it turns out that while training, both the forward and the backward steps are about ~50 times slower than the same steps in the DyNet model. I’m using the CPU (on the same machine for both) since this model is relatively small and it takes few minutes to train it with dynet.

Am I doing something wrong?

Maybe Pytorch RNN/LSTM is optimized only for GPU usage?

I will be happy for some help.

Thanks!