Unrolled LSTM performing worse than a rolled one?

Hi, I was experimenting with LSTMs and noted that the training for an unrolled LSTM seems to be a lot worse than a rolled one. The test errors I get are a lot higher.

So below are two variants of my code that are relevant. The remainder of my code is untouched. This one is the unrolled version. I simply pass the data with the full timestep before sending it to a fully connected layer.

class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        num_layers=1
        
        # single layer lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=hidden_dim, num_layers=num_layers, batch_first=True, dropout = 0.7)
        self.h0 = Variable(torch.randn(num_layers, batch_size, hidden_dim)) 
        self.c0 = Variable(torch.randn(num_layers, batch_size, hidden_dim))
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
                    
        output, hn = self.lstm(x, (self.h0,self.c0))  
        output = self.fc1(output[:,-1,:])
                                
        return output

And the test errors (right most result, out of 100)

epoch 0 tr loss 54.90 te loss 17.37 tr err 144/316 te err 51/100
epoch 20 tr loss 48.21 te loss 15.11 tr err 96/316 te err 31/100
epoch 40 tr loss 37.15 te loss 13.07 tr err 71/316 te err 27/100
epoch 60 tr loss 31.83 te loss 15.43 tr err 62/316 te err 28/100
epoch 80 tr loss 27.14 te loss 25.34 tr err 45/316 te err 29/100
epoch 100 tr loss 24.40 te loss 32.11 tr err 39/316 te err 28/100
epoch 120 tr loss 23.74 te loss 22.59 tr err 32/316 te err 24/100
epoch 140 tr loss 28.67 te loss 23.78 tr err 50/316 te err 26/100
epoch 160 tr loss 15.99 te loss 29.97 tr err 24/316 te err 30/100
epoch 180 tr loss 18.61 te loss 29.87 tr err 22/316 te err 26/100
epoch 200 tr loss 25.49 te loss 36.15 tr err 31/316 te err 28/100
epoch 220 tr loss 20.56 te loss 33.28 tr err 33/316 te err 24/100
epoch 240 tr loss 6.13 te loss 49.73 tr err 7/316 te err 25/100
epoch 260 tr loss 18.26 te loss 38.68 tr err 12/316 te err 27/100
epoch 280 tr loss 4.94 te loss 54.48 tr err 4/316 te err 23/100
epoch 300 tr loss 4.12 te loss 57.66 tr err 9/316 te err 25/100
epoch 320 tr loss 20.31 te loss 47.79 tr err 28/316 te err 28/100
epoch 340 tr loss 3.74 te loss 76.23 tr err 10/316 te err 28/100
epoch 360 tr loss 20.10 te loss 45.14 tr err 25/316 te err 23/100
epoch 380 tr loss 2.62 te loss 54.53 tr err 16/316 te err 28/100
epoch 400 tr loss 2.22 te loss 51.11 tr err 13/316 te err 24/100
epoch 420 tr loss 2.21 te loss 55.38 tr err 12/316 te err 29/100
epoch 440 tr loss 5.46 te loss 51.78 tr err 11/316 te err 22/100
epoch 460 tr loss 1.88 te loss 46.23 tr err 13/316 te err 25/100
epoch 480 tr loss 8.04 te loss 43.05 tr err 19/316 te err 25/100

Now I loop through the data and pass each timestep before sending the final output to a fully connected layer.

class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        # lstm architecture
        self.hidden_size=hidden_dim
        self.input_size=feature_dim  
        self.batch_size=batch_size
        self.num_layers=1
        
        # lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True)
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
        
        # initialize hidden and cell
        hn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        cn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        
        # step through the sequence one timestep at a time
        for xt in torch.t(x):
            output, (hn,cn) = self.lstm(xt[:,None,:], (hn,cn))
         
        # output is [batch size, timestep = 1, hidden dim]
        output = self.fc1(output[:,0,:])
        return output

And the test errors


epoch 0 tr loss 54.89 te loss 17.44 tr err 154/316 te err 53/100
epoch 20 tr loss 48.50 te loss 17.40 tr err 84/316 te err 43/100
epoch 40 tr loss 36.92 te loss 15.90 tr err 72/316 te err 34/100
epoch 60 tr loss 32.13 te loss 18.82 tr err 52/316 te err 32/100
epoch 80 tr loss 29.61 te loss 27.07 tr err 41/316 te err 27/100
epoch 100 tr loss 30.03 te loss 28.65 tr err 41/316 te err 31/100
epoch 120 tr loss 22.94 te loss 39.26 tr err 32/316 te err 31/100
epoch 140 tr loss 22.82 te loss 43.07 tr err 28/316 te err 33/100
epoch 160 tr loss 19.11 te loss 47.77 tr err 34/316 te err 32/100
epoch 180 tr loss 19.52 te loss 46.45 tr err 29/316 te err 33/100
epoch 200 tr loss 22.89 te loss 45.91 tr err 21/316 te err 29/100
epoch 220 tr loss 24.83 te loss 50.92 tr err 28/316 te err 35/100
epoch 240 tr loss 12.37 te loss 54.97 tr err 36/316 te err 34/100
epoch 260 tr loss 11.72 te loss 54.28 tr err 30/316 te err 33/100
epoch 280 tr loss 9.71 te loss 55.99 tr err 20/316 te err 35/100
epoch 300 tr loss 21.23 te loss 71.60 tr err 27/316 te err 34/100
epoch 320 tr loss 8.87 te loss 53.11 tr err 32/316 te err 31/100
epoch 340 tr loss 7.34 te loss 59.80 tr err 32/316 te err 37/100
epoch 360 tr loss 4.35 te loss 73.08 tr err 7/316 te err 35/100
epoch 380 tr loss 5.93 te loss 68.64 tr err 27/316 te err 33/100
epoch 400 tr loss 3.67 te loss 78.00 tr err 18/316 te err 35/100
epoch 420 tr loss 15.13 te loss 64.23 tr err 39/316 te err 38/100
epoch 440 tr loss 2.61 te loss 88.74 tr err 8/316 te err 38/100
epoch 460 tr loss 4.82 te loss 82.88 tr err 5/316 te err 38/100
epoch 480 tr loss 2.72 te loss 93.69 tr err 8/316 te err 42/100

I have run this experiment several times and always see that the unrolled version performs worse. Is there something wrong with the way I am manually stepping through the LSTM ?

Has anyone tried training an unrolled lstm ?

your comparison is not fair, in the first version, you are defining c0 and h0 in the init function, so they are constant throughout the learning but in the second version, you are setting h0 and c0 to a new random tensor
solving this problem will probably answer your question

Hello, I made the following modification to my code. I initialized hn and cn once when creating the neural network. Then I passed their value to h0,c0 everytime a forward pass is performed. I have run the code several times but I am never able to get equally good results. On the other hand, the rolled lstm always provides the roughly similar results over the epochs.

class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        # lstm architecture
        self.hidden_size=hidden_dim
        self.input_size=feature_dim  
        self.batch_size=batch_size
        self.num_layers=1
        
        # initialize hidden and cell
        self.hn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        self.cn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        
        # lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True)
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
              
        h0 = self.hn
        c0 = self.cn
        
        # step through the sequence one timestep at a time        
        for (i,xt) in enumerate(torch.t(x)): 
                output, (h0,c0) = self.lstm(xt[:,None,:], (h0,c0))
         
        output = self.fc1(output[:,-1,:])
        return output

epoch 0 tr loss 54.92 te loss 17.42 tr err 158/316 te err 56/100
epoch 20 tr loss 49.03 te loss 17.09 tr err 85/316 te err 36/100
epoch 40 tr loss 34.17 te loss 16.82 tr err 70/316 te err 26/100
epoch 60 tr loss 30.62 te loss 24.70 tr err 57/316 te err 31/100
epoch 80 tr loss 26.15 te loss 26.07 tr err 41/316 te err 32/100
epoch 100 tr loss 22.72 te loss 39.18 tr err 41/316 te err 33/100
epoch 120 tr loss 21.97 te loss 44.00 tr err 49/316 te err 34/100
epoch 140 tr loss 18.72 te loss 46.30 tr err 29/316 te err 32/100
epoch 160 tr loss 18.30 te loss 47.71 tr err 33/316 te err 35/100
epoch 180 tr loss 13.59 te loss 51.09 tr err 22/316 te err 36/100
epoch 200 tr loss 10.30 te loss 72.76 tr err 11/316 te err 40/100
epoch 220 tr loss 11.10 te loss 71.32 tr err 23/316 te err 37/100
epoch 240 tr loss 7.85 te loss 71.26 tr err 8/316 te err 36/100
epoch 260 tr loss 8.96 te loss 60.27 tr err 21/316 te err 32/100
epoch 280 tr loss 6.97 te loss 63.88 tr err 10/316 te err 36/100
epoch 300 tr loss 10.76 te loss 65.86 tr err 8/316 te err 36/100
epoch 320 tr loss 4.51 te loss 62.41 tr err 9/316 te err 35/100
epoch 340 tr loss 4.13 te loss 60.39 tr err 8/316 te err 33/100
epoch 360 tr loss 15.63 te loss 65.40 tr err 16/316 te err 36/100
epoch 380 tr loss 14.48 te loss 73.36 tr err 81/316 te err 36/100
epoch 400 tr loss 9.04 te loss 62.02 tr err 5/316 te err 37/100
epoch 420 tr loss 3.63 te loss 55.84 tr err 16/316 te err 29/100
epoch 440 tr loss 1.13 te loss 74.18 tr err 0/316 te err 39/100
epoch 460 tr loss 0.07 te loss 101.76 tr err 0/316 te err 45/100
epoch 480 tr loss 0.02 te loss 112.72 tr err 0/316 te err 44/100

Can you check your input to unrolled lstm? it could just be that your input is always the same. Another important issue is that output from your first lstm model (auto-unrolled) has output from every time step, whereas the manually unrolled model is only keeping the output from the last time step.

  • output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
  • c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len

Hi, I did some tests as u have suggested to check the variables when using the unrolled and auto-unrolled lstm.

  1. I computed the L2 norm of the hidden states and they are 0
  2. I computed the L2 norm of the inputs and they are 0
  3. I computed the L2 norm of the outputs but they are NOT 0

I pasted the outputs of both 1 and 2 below for a batch size of 4. I cant tell what is going on. Why the outputs at each timestep would not be the same. I have no dropout. I use the same initializer for both rolled and auto-rolled lstms.

class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        # lstm architecture
        self.hidden_size=hidden_dim
        self.input_size=feature_dim  
        self.batch_size=batch_size
        self.num_layers=1
        
        # initialize hidden and cell
        self.hn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        self.cn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        
        # lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True)
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
                 
        # xt is correct
            
        h0 = self.hn
        c0 = self.cn
        
        print("Original shape ", np.shape(x))
        print("Tranpose shape ", np.shape(torch.t(x)))
        
        output_all, (hn_all,cn_all) = self.lstm(x, (h0, c0)) 
        print("Original output shape ", np.shape(output_all))
        print("Original hn shape ", np.shape(hn_all))
        print("Original cn shape ", np.shape(cn_all))
        
        # step through the sequence one timestep at a time        
        for (i,xt) in enumerate(torch.t(x)): 
                output, (h0,c0) = self.lstm(xt[:,None,:], (h0,c0))
               
                # CHECK INPUTS
                print(np.linalg.norm(xt - x[:,i,:]))
                
        print("New hn shape ", np.shape(h0))
        print("New cn shape ", np.shape(c0))
                
        # CHECK STATES
        print(np.linalg.norm(hn_all - h0))
        print(np.linalg.norm(cn_all - c0))
                         
        output = self.fc1(output[:,-1,:])
        return output
Original shape  torch.Size([4, 50, 28])
Tranpose shape  torch.Size([50, 4, 28])
Original output shape  torch.Size([4, 50, 30])
Original hn shape  torch.Size([1, 4, 30])
Original cn shape  torch.Size([1, 4, 30])
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
New hn shape  torch.Size([1, 4, 30])
New cn shape  torch.Size([1, 4, 30])
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[Variable containing:
 0
[torch.FloatTensor of size 1]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

Can someone PLEASE send me a SIMPLE example of TRAINING a rolled and unrolled LSTM !?

  • Use batch first
  • Use for loop for unrolled

need help thanks. need help thanks

Hi Kong - Did you find a solution for manually unrolling LSTM?

Hi Knog and others who are looking for rolling training.
This training script has an option to train the model using the rolling method you mentioned earlier and the results are good too.
script - link