To learn Pytorch, I’m trying to convert my latest project from DyNet to PyTorch.
The model is based on a stacked BiLSTM that produces representations for each time step, each such representation is fed into a one-hidden-layer MLP that produces softmax probabilities (same MLP for each time step). Then, I use negative log loss and sum all the losses using python regular sum function.
The relevant model code:
self.birnn = nn.LSTM(rnn_input_dim, rnn_output_dim, num_layers=2, bidirectional=True) self.mlp_linear1 = nn.Linear(2*rnn_output_dim, mlp_hid_dim) self.mlp_linear2 = nn.Linear(mlp_hid_dim, mlp_out_dim) def predict_probs(self, seq): ''' Propagate through the network and output the probabilties of the classes of each element ''' # Feed the input sequence into the BiRNN and get the representation of # its elements rnn_outputs = self.birnn_layer(seq) # Feed all the BiRNN outputs (y1..yn) into the MLP and get # a list of log softmaxes return [self.do_mlp(y) for y in rnn_outputs] def birnn_layer(self, seq): ''' Feed the input sequence into the BiRNN and return the representation of all the elemetns in the sequence (y1,..,yn) hidden = (Variable(torch.randn(4, 1, self.rnn_output_dim)), Variable(torch.randn(4, 1, self.rnn_output_dim))) # Feed the sequence of vectors into the BiRNN out, hidden = self.birnn(seq, hidden) return out def do_mlp(self, x): ''' Propagate the given vector through a one hidden layer MLP ''' h = self.relu(self.mlp_linear1(x)) y = self.mlp_linear2(h) return self.log_softmax(y)
The training relevant code:
# get the losses of the frame-based classifications losses =  for i,probs in enumerate(probs_list): # get the current gold label label = self.get_label(i, left_index, right_index) pytorch_label = Variable(torch.LongTensor([label])) # get negative log loss losses.append(loss_function(probs, pytorch_label)) # calculate precision vals = probs.data.numpy() if np.argmax(vals) == label: train_success += 1 num_of_train_frames += 1 # sum all losses total_loss = sum(losses) train_closs += float(total_loss.data.numpy()) # Back propagation total_loss.backward() optimizer.step()
I tested the code and it seems fine - it produces the same results as the dynet model. However, it turns out that while training, both the forward and the backward steps are about ~50 times slower than the same steps in the DyNet model. I’m using the CPU (on the same machine for both) since this model is relatively small and it takes few minutes to train it with dynet.
Am I doing something wrong?
Maybe Pytorch RNN/LSTM is optimized only for GPU usage?
I will be happy for some help.