LSTM model not updating weights

khaled_refai · March 18, 2019, 9:23am

I get
epoch 1 14
Epoch: 1/14… Training Loss: -7.350… Test Loss: -7.327… Test Accuracy: 15% accuracy 0.0 steps :9904 of 495201
Validation loss decreased (inf --> -0.146516). Saving model …
epoch 2 14
Epoch: 2/14… Training Loss: -3.676… Test Loss: -7.327… Test Accuracy: 15% accuracy 0.0 steps :19808 of 495201
Validation loss decreased (-0.146516 --> -0.146516). Saving model …
epoch 3 14
Epoch: 3/14… Training Loss: -2.451… Test Loss: -7.327… Test Accuracy: 15% accuracy 0.0 steps :29712 of 495201
Validation loss decreased (-0.146516 --> -0.146516). Saving model …
epoch 4 14

and not updating

 def forward(self, x ,hidden):
        ''' Forward pass through the network, returns the output logits '''
        x = x.float()
        xo = x.reshape(-1, batch_size,self.input_size)
        lstm_out, hidden = self.lstm(xo, hidden)
        # stack up lstm outputs
        x = lstm_out.contiguous().view(-1, self.lstm_size)
        x = self.dropout(x)
        x=  self.fc1(x)
        x = self.relu1(x)
        x= self.fc2(x)
        x= self.relu2(x)
        x = self.output(x)
        return F.softmax(x,dim=1) , hidden

train

  for images, labels in trainloader:
            if (train_on_gpu):
              images , labels = images.cuda() , labels.cuda()
            steps += 1
            optimizer.zero_grad()
            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])
            # Flatten images into a 5 long vector
            images.resize_(images.size()[0], 7)
            output,h = model.forward(images,h)
            loss = criterion(output.squeeze(), labels.long())
              # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(
            model.parameters(), clip)
            loss.backward()
            optimizer.step()
            _, preds = torch.max(output.data, 1)
            running_loss += loss.item()*images.size(0) 
            train_correct += torch.sum(preds == labels.data)

full code :
https://colab.research.google.com/drive/1Uj3W9EsgmGFLIv8T8gFn5CIalt7PmRe9#scrollTo=hvCjQZcQIAtd

ptrblck · March 19, 2019, 12:25pm

What kind of criterion are you using?
If you are using nn.CrossEntropyLoss, you should pass the logits directly (no softmax at the end).
Otherwise if you are using nn.NLLLoss, you should apply F.log_softmax as the last non-linearity.