LSTM network is not training

Running LSTM over bAbI dataset.The Encoder module calculated embedding for the story and query and process them with two GRU and the Answer module consumes the representation encoded by Encoder and uses a fully connected layer to pick a word from the answer vocabulary which separate from story-query vocabulary.

 encoder = EncoderRNN(svocab_size, hidden_size).cuda()
 answer =  Answer(hidden_size, avocab_size).cuda()

The training loss remain the same value([0] loss: 4.1544718742370605), no matter what. I have tried,

  • batch_size - 1, 16, 128, 256, 512
  • optimizers - SGD, Adam
  • lr - 0.1, 0.01, 0.001

Please find the code for model below.

class EncoderRNN(nn.Module):
   def __init__(self, input_size, hidden_size, n_layers=1):
    super(EncoderRNN, self).__init__()
    self.n_layers = n_layers
    self.hidden_size = hidden_size

    self.embedding = nn.Embedding(input_size, hidden_size)
    self.storyRnn = nn.GRU(hidden_size, hidden_size)
    self.queryRnn = nn.GRU(hidden_size, hidden_size)

def forward(self, story, query, shidden, qhidden):
    #print(story.size())
    sembedded = self.embedding(story).transpose(1, 0)
    soutput = sembedded
    #print(sembedded.size())

    for i in range(self.n_layers):
        soutput, shidden = self.storyRnn(soutput, shidden)
        
    qembedded = self.embedding(query).transpose(1, 0)
    qoutput = qembedded
    for i in range(self.n_layers):
        qoutput, qhidden = self.queryRnn(qoutput, qhidden)

    return soutput, shidden, qoutput, qhidden

class Answer(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Answer, self).__init__()
        self.input_size = hidden_size * 2
        self.output_size = output_size
        self.linear = nn.Linear(self.input_size, self.output_size)
        
    def forward(self, input):
        return F.softmax(self.linear(input))

def train(input, target, modules, criterion, optimizers):
    encoder, answer = modules
    eoptim, aoptim = optimizers
    
    story, query = input
    story, query = torch.LongTensor(story).cuda(), torch.LongTensor(query).cuda()
    story, query = Variable(story), Variable(query)
    target= torch.LongTensor(target).cuda()
    target= Variable(target)
        
    batch_size = story.data.size()[0]
    shidden_state = encoder.initHidden(batch_size)
    qhidden_state = encoder.initHidden(batch_size)
    
    so, sh, qo, qh = encoder(story, query, shidden_state, qhidden_state)    

    representation = F.elu(torch.cat((so[-1], qo[-1]), 1))
    prediction = answer(representation)
    loss = criterion(prediction, target)
    loss.backward()
    eoptim.step()
    aoptim.step()
    
    return loss.data[0]

Hi,

I know this is a post long time ago and probably you are not working on it any more. But I just post my answer here in case someone else has a similar problem.

Actually, I just had one similar problem recently. I have tried anything like you did. Just before I gave up, I saw the permute function (similar as the transpose function you used) and I decided to replace it. I have almost replaced all other functions, so why not it. I deleted this function and used the batch_first option in RNN (some other operations such as the output are done accordingly). Surprisingly, it is working. I also tested the use of transpose, it is not working, either.

So I think there might be some problems about the permute or transpose function. But I haven’t written an example to prove it yet.

Try to remove these functions and try the network again. It may surprise you.

2 Likes