LSTMCell and Teacher Forcing

I’m fairly new to PyTorch and I’m trying to design an 18 node LSTM using LSTMCell with Teacher Forcing. I have quite a few difficulties.

Here’s my model:

class tryLSTM(nn.moduleList):
    def __init__(self, input_size, hidden_size, batch_size):
        super(tryLSTM, self).__init__()

       self.input_size = input_size
       self.hidden_size = hidden_size
       self.batch_size = batch_size

       self.lstm0 = nn.LSTMCell(input_size, hidden_size, bias=True)
       self.lstm1 = nn.LSTMCell(input_size, hidden_size, bias=True)
       self.lstm2 = nn.LSTMCell(input_size, hidden_size, bias=True) 
       self.lstm17 = nn.LSTMCell(input_size, hidden_size, bias=True)
   def init_hidden(self):
       # initialize the hidden state and the cell state to zeros
       hidden = torch.zeros(self.batch_size, self.hidden_size)
       cell = torch.zeros(self.batch_size, self.hidden_size)
           return hidden, cell
   def forward(self, x, hc):
       out = []
       h_0, c_0 = hc

       h_1, c_1 = self.lstm1(x[0], h_0, c_0)
       out[0] = h_1

       h_2, c_2 = self.lstm2(x[1], h_1, c_1)
       out[1] = h_2
       h_17, c_17 = self.lstm17(x[16], h_16, c_16)
       out[16] = h_17

model = tryLSTM(input_size=128, hidden_size=128, batch_size=18)

if gpu: model.cuda()

optimizer = optim.Adam(model.parameters(), lr=0.0001)

criterion = nn.BCELoss(weight=None, reduction='mean')

here’s the training loop:

def train(epoch):
# initialize hidden and cell state
hc = model.init_hidden()
for batch_idx, (data, target) in enumerate(train_loader):
    # Zero out the gradients
    target = data[1:]
    # Put data on GPU
    if gpu:
        data = data.cuda()
        target = target.cuda()
    # Get outputs of LSTM
    output = model(data, hc)
    # Calculate loss
    loss = criterion(output, target)
    # Calculate gradients
    # Update model parameters

Q.1I’m getting the following error:
TypeError: forward() takes from 2 to 3 positional arguments but 4 were given

I’m not sure if this is the correct way to build what I want.
My mini batch is X(18,3,128,128) These are 18 images. What I want to achieve is as follows:
The 1st cells input is x[0] and output h_1 should be similar to x[1].
2nd cells input is x[1] and h_1 and output h_2 should be similar to x[2]
and so on.
I believe the forward pass is run once for each image in the mini batch. So for a mini batch containing 18 images the forward defined above will run 18 times? That is not desired at all. What I want to do is run it once per mini batch but I need to pass in all 18 images as I’m using teacher forcing.
What am I doing wrong? Is there a better way to build this architecture?

Please help, Thanks!

@ptrblck @smth @albanD @SimonW @apaszke @tom