I build three versions of the same network achitecture( i am not sure now). And trained them on the same dataset( i swear) using the same solver Adam with default hyper parameters.
The caffe and keras version works really well, but the pytorch version just doesn’t work.
the loss will go down, but rises at the end of every epoch.
This is strange. What’s wrong with my code?
time_step = 150 batch_size = 256 input_dim = 6 lstm_size1 = 100 lstm_size2 = 512 fc1_size = 512 class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.lstm1 = nn.LSTM(input_dim, lstm_size1, dropout=0.3) self.lstm2 = nn.LSTM(lstm_size1, lstm_size2, dropout=0.3) self.fc1 = nn.Linear(lstm_size2, fc1_size) self.fc2 = nn.Linear(fc1_size, 3755) def forward(self, x, num_strokes, batch_size): x = torch.nn.utils.rnn.pack_padded_sequence(x,num_strokes) hidden_cell_1 = (Variable(torch.zeros(1, batch_size, lstm_size1).cuda()), Variable(torch.zeros(1, batch_size, lstm_size1).cuda())) allout_1, _ = self.lstm1(x, hidden_cell_1) hidden_cell_2 = (Variable(torch.zeros(1, batch_size, lstm_size2).cuda()), Variable(torch.zeros(1, batch_size, lstm_size2).cuda())) _, last_hidden_cell_out = self.lstm2(allout_1, hidden_cell_2) last_hidden_cell_out = Variable(last_hidden_cell_out.data) x = last_hidden_cell_out.view(batch_size, lstm_size2) x = self.fc1(x) x = F.relu(x) x = F.dropout(x, p = 0.3) x = self.fc2(x) # out = F.log_softmax(x) return x
The mainly training part:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # one sample is of size [150, 6], so data is of size [256, 150, 6], i need to transpose data to get T*B*L data = torch.transpose(data, 0, 1) data, target = Variable(data), Variable(target) optimizer.zero_grad() out = model(data, num_strokes, batch_size) loss = F.cross_entropy(out, target) loss.backward() optimizer.step()
The key point here is that my data is padded to 150, which means one sample maybe only have a length of 35 and of size [35, 6], i padded it to [150, 6], so only 35 lines at the begining are none zero. the rest 115 lines are all zeros. So i use:
x = torch.nn.utils.rnn.pack_padded_sequence(x,num_strokes)
where x is sorted by data lengths, and
num_strokes indicates their lengths in descent order.
Am i using it in the right way?
Could any help me? I really don’t know why after debug for several days