LSTM to linear to LSTM to linear again

I’m new to PyTorch and I have been experimenting with a number of things trying to get my bearings with LSTMs. I want to create a pipeline that goes LSTM → linear → LSTM → linear, but I’m getting stuck transitioning from linear back to LSTM again. The shape of my states does not seem to be correct. Here’s what I have to initialize them:

self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True,dropout=dropout)
self.fully_connected = nn.Linear(hidden_size, output_size)
nn.init.xavier_uniform_(self.fully_connected.weight)

self.lstm2 = nn.LSTM(input_size=1, hidden_size=hidden_size, num_layers=num_layers, batch_first=True,dropout=dropout)
self.fully_connected2 = nn.Linear(hidden_size, output_size)

Then, in my forward method, I have this:


 hidden_initial = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
cell_initial = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
out, states = self.lstm(x, (hidden_initial,cell_initial))

h, c = states
out = out[:, -1, :]
out = self.fully_connected(out)

#Issue happens here
out, _ = self.lstm2(out, (h, c))

It tells me that it won’t take a (3D, 3D) input. However, I’m not sure what can be done to reshape this into the correct format. I’ve tried various slicing, viewing, and reshaping but have not managed yet.

I’d love if anyone could offer advice.

@cahr

  • I would definitely like to know how you wish to leverage this for your use case

  • You can study this link for detailed description of input and output structure

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

Working code. Hope this helps

import os
import numpy as np

import torch
from torch import nn
from torch.autograd import Variable
from torch.utils.data import DataLoader

class model1(nn.Module):
    def __init__(self):
        super(model1,self).__init__()
        self.input_size=10
        self.hidden_size=8
        self.num_layers=2
        self.output_size=4
        self.batch_size=10
        self.sequence_length=20
        self.lstm = nn.LSTM(input_size=self.input_size, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True)
        self.fully_connected = nn.Linear(self.hidden_size, self.output_size)
        nn.init.xavier_uniform_(self.fully_connected.weight)
        self.input_size2=1
        self.hidden_size2=self.hidden_size
        self.num_layers2=2
        self.output_size2=4
        self.batch_size2=10
        self.sequence_length2=4
        self.lstm2 = nn.LSTM(input_size=self.input_size2, hidden_size=self.hidden_size2, num_layers=self.num_layers2, batch_first=True)
        self.fully_connected2 = nn.Linear(self.hidden_size, self.output_size)
        
        
    def forward(self,x):
        hidden_initial = torch.zeros(self.num_layers*1, self.batch_size, self.hidden_size)
        cell_initial = torch.zeros(self.num_layers*1, self.batch_size, self.hidden_size)
        out, states = self.lstm(x, (hidden_initial,cell_initial))
        h, c = states
        out = out[:, -1, :]
        out = self.fully_connected(out)
        out = out.reshape(out.shape[0], out.shape[1], 1)
        out, _ = self.lstm2(out, (h, c))
        return out


    
learning_rate = 0.001
num_epochs=100

model=model1()
criterion=nn.MSELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=learning_rate)


X=np.random.rand(model.batch_size, model.sequence_length, model.input_size).astype(np.float32)
Y=np.random.rand(model.input_size, model.sequence_length2, model.hidden_size2 * 1).astype(np.float32)
print(Y.shape)
inputVal=Variable(torch.from_numpy(X))
outputVal=Variable(torch.from_numpy(Y))

for epoch in range(num_epochs):
    # In a gradient descent step, the following will now be performing the gradient descent now
    optimizer.zero_grad()
    # We will now setup the model
    dataOutput = model(inputVal)
    # We will now define the loss metric
    loss = criterion(outputVal, dataOutput)
    # We will perform the backward propagation
    loss.backward()
    optimizer.step()
    if epoch % 10 == 0:
        print('epoch [{}/{}], loss:{:.4f}'.format(epoch + 1, num_epochs, loss))        

Thanks so much for the help, that worked. I also appreciate the link.

As for how I wish to leverage this, I’m going based off a few papers I have read on the topic that my project is on, and they seem to often use an encoder/decoder framework. My plan is to learn how to do that incrementally so that I can implement the framework similarly to what other research teams have managed.

@cahr Great. Please continue to share your learning with the forum as well.

Also request you to mark the response as the solution