[Sloved] Why my loss not decreasing

JStavy · April 6, 2019, 10:38pm

Hey Peter!

Was just reviewing this thread and had two questions.

As of right now, my forward function is:

def __init__(self, input_length = 7,lstm_size = 64, lstm_layers=1, output_size = 1, 
                               drop_prob=0.2):
        super().__init__()
        self.input_length = input_length
        self.output_size = output_size
        self.lstm_size = lstm_size
        self.lstm_layers = lstm_layers
        self.drop_prob = drop_prob
        self.lstm = nn.LSTM(input_length, lstm_size, lstm_layers, 
                            dropout=drop_prob, batch_first=False)
        self.dropout = nn.Dropout(drop_prob)
        self.fc = nn.Linear(lstm_size, output_size)

def forward(self, nn_input, hidden_state):
        lstm_out, hidden_state = self.lstm(nn_input, hidden)
        lstm_out = lstm_out[-1, :, :] # this gets the final LSTM output for each sequence in the batch
        lstm_out = self.dropout(self.fc(lstm_out))
                
        return lstm_out, hidden_state

    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = (weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_(),
              weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_())
        
        return hidden

1: Where would I add the nonlinear activation function in here? From what I’ve read it really depends on your activation functions which leads to the 2nd question.
2: I’m working with pct_change time series data that is attempting to predict a future pct_change. would I be correct in assuming that I’m better off using tanh than ReLU?

Thanks!