LSTM (hidden_size), (num_layers) setting question

DonghunP · May 6, 2021, 12:46pm

Hi I’m a newbie in LSTM and I want to ask basic question.

I’m training below network and xi is 64 bit data.

This paper said
“These bits(64bit data) are transformed by two non-recurrent hidden layers, each with 128 units and tanh activation functions.
The output of the linear layers is fed into two recurrent LSTM layers, each with 512 units and tanh activation functions.”

and this is snippet code of me

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)

        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        h_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size)) #([1, 50656, 64])

        c_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size)) #([1, 50656, 64])

        # Propagate input through LSTM
        ula, (h_out, _) = self.lstm(x, (h_0, c_0))

My question is how should I have to set ‘hidden_size’ and ‘num_layers’ in this network??

How can I set fully connected layer in this case??

Actually, I can find Time-Series anomaly detection LSTM in single variable code in Google.

But, hard to find Time-Series anomaly detection LSTM in multivariate variable like my case.

This is not my homework and,

Any comment or existing tutorial will be appreciate.

ariG23498 · May 6, 2021, 1:34pm

Hey @DonghunP
With the screenshot shared I am thinking the model should look something like this

class CustomModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.projection = nn.Sequential(
            nn.Linear(64, 128),
            nn.Tanh(),
            nn.Linear(128, 128),
            nn.Tanh(),
        )
        self.LSTM = nn.LSTM(input_size=128,
                            hidden_size=512,
                            num_layers=2)
    def forward(self, x):
        pro = self.projection(x)
        output = self.LSTM(pro)
        return output

With an input of shape (seq_leng, batch_size, 64) the model would first transform the input vectors with the help of the projection layer, and then send that to the LSTM layer. Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.

Hope this makes sense to you!

DonghunP · May 7, 2021, 2:07am

@ariG23498 Very Thanks kind reply

Anyway,

Do you know why they expand input 64 dimension to 128 ( Why expand it?)

In my case, 64 dimension Input (actually 64bit size data) is partially correlated.
Is it okay to expand this data?

Do you know why they set num_layer as ‘2’? Any criterion?

Any material that I can refer?

Thanks bro.

ariG23498 · May 7, 2021, 4:20am

Expansion of dimensions is mostly done to enhance the parameters (read weights and Biases) of a model. This might lead to better capture of the data distribution, but might also lead to over-fitting.
They set num_layers=2 to use two LSTM layer stacked one on top of the other. This way, they use recurrence of two layers. This is indeed an expensive operation, but using stacked LSTM helps in better recurrence capture of the sequence.

I would want you to take a look at an article that I had written on LSTM to have a good understanding about it.
The Article can be found here