Dropout for LSTM state transitions

Hi, I was experimenting with LSTMs and noted that the dropout was applied at the output of the LSTMs like in the figure in the left below . I was wondering if it is possible to apply the dropout at the state transitions instead like on the right.

1 Like

Hello, does no one working at pytorch have an answer for this ?

1 Like

you can do that by manually unrolling LSTM. The output of LSTM will be output, (hn,cn). You can apply dropout to (hn,cn) via a dropout layer.

Ok I use this

for i in range(np.shape(x)[1]):    
            output, self.hidden = self.lstm(x[:,i,None,:], self.hidden)

But now I get the error

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.$

Last time I use

self.lstm = nn.LSTM(feature_dim, hidden_size=hidden_dim, num_layers=num_layers, batch_first=True, dropout = 0.7)
self.h0 = Variable(torch.randn(num_layers, batch_size, hidden_dim)) 
self.c0 = Variable(torch.randn(num_layers, batch_size, hidden_dim))
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
                    
        output, hn = self.lstm(x, (self.h0,self.c0))  
        output = self.fc1(output[:,-1,:])

and everything work. Why ?

where can i find an example that trains an unrolled lstm ?

Take a look at this please:


I am not sure why you are having that problem, but please check the type of your input to lstm.

Thank you I got it to work ! But it seems to run a lot slower than before. I guess its because I am extracting every row (timestep) of the array.

Its actually not working. Gonna post another thread for it.

can you explain a bit what you you mean by not working?

Thank you for your continued help sir :slight_smile: I have 2 versions of my code. I only paste the ones that I changed.
I have verified that this version is working


class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        num_layers=1
        
        # single layer lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=hidden_dim, num_layers=num_layers, batch_first=True, dropout = 0.7)
        self.hn = Variable(torch.randn(num_layers, batch_size, hidden_dim)) 
        self.cn = Variable(torch.randn(num_layers, batch_size, hidden_dim))
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
        
        for xt in torch.t(x):
            output, (hn, cn) = self.lstm(xt[:,None,:], (self.hn,self.cn))
         
        output = self.fc1(output[:,0,:])
                                
        return output

I realized that I made a mistake and that I should be passing the output to self.hn and self.cn. I made the changes as follows

class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        num_layers=1
        
        # single layer lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=hidden_dim, num_layers=num_layers, batch_first=True)
        self.hn = Variable(torch.randn(num_layers, batch_size, hidden_dim)) 
        self.cn = Variable(torch.randn(num_layers, batch_size, hidden_dim))
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
        
        # step through the sequence one timestep at a time
        for xt in torch.t(x):
            output, (self.hn, self.cn) = self.lstm(xt[:,None,:], (self.hn,self.cn))
         
        output = self.fc1(output[:,0,:])
        return output

And it now returns me


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-4ad996d02c13> in <module>()
      9 # train model
     10 #tr_loss, te_loss = bci.train_model(model, tr_input, tr_target, 4, te_input, te_target, 4, 500)
---> 11 tr_err, te_err, tr_loss, te_loss = bci.train_model2(model, tr_input, tr_target, tr_target_onehot, 4, te_input, te_target, te_target_onehot, 4, 500)
     12 
     13 # compute train and test errors

~\Desktop\Kong\project1\dlc_bci.py in train_model(model, train_input, train_target, tr_target_onehot, train_mini_batch_size, test_input, test_target, te_target_onehot, test_mini_batch_size, epoch)
    165             # update the weights by subtracting the negative of the gradient
    166             model.zero_grad()
--> 167             loss.backward()
    168             optimizer.step()
    169 

~\Anaconda3\envs\dl\lib\site-packages\torch\autograd\variable.py in backward(self, gradient, retain_graph, create_graph, retain_variables)
    165                 Variable.
    166         """
--> 167         torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
    168 
    169     def register_hook(self, hook):

~\Anaconda3\envs\dl\lib\site-packages\torch\autograd\__init__.py in backward(variables, grad_variables, retain_graph, create_graph, retain_variables)
     97 
     98     Variable._execution_engine.run_backward(
---> 99         variables, grad_variables, retain_graph)
    100 
    101 

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

maybe the this discussion might help:

One thing I noticed is that h,c is initialized only once in your code when the class is instantiated. However, every time you start feeding a new sequence, you have to re-initialize h,c. Not sure if this is causing the issue though. You can add an init_hidden method to your class and use that to initialize hidden states every time you feed a new sequence.

1 Like

Sorry but what did you mean by re-iintializing h and c ? Won’t I lose the updates done to h and c during backprop ?

h and c are not learned parameters. Check this example please:
http://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#example-an-lstm-for-part-of-speech-tagging

1 Like

Argh I totally forgot about that ! I have modified my code accordingly and it now works. Thank you very much for your continued assistance :slight_smile:


class Net(nn.Module):
    def __init__(self, feature_dim, hidden_dim, batch_size):
        super(Net, self).__init__()
        
        # lstm architecture
        self.hidden_size=hidden_dim
        self.input_size=feature_dim  
        self.batch_size=batch_size
        self.num_layers=1
        
        # lstm
        self.lstm = nn.LSTM(feature_dim, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True)
        
        # fc layers
        self.fc1 = nn.Linear(hidden_dim, 2)    
                
    def forward(self, x, mode=False):
        
        # initialize hidden and cell
        hn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        cn = Variable(torch.randn(self.num_layers, self.batch_size, self.hidden_size))
        
        # step through the sequence one timestep at a time
        for xt in torch.t(x):
            output, (hn,cn) = self.lstm(xt[:,None,:], (hn,cn))
         
        output = self.fc1(output[:,0,:])
        return output