LSTM accuracy doesn't increase

Hello,

I’m implementing RNN for video classification, but the accuracy doesn’t increase. I use pretrained resnet-50 as features extractor to feed LSTM. In particular I’m worried that I’m messing up the forward step of the network, here is my code

    def forward(self, x, seq):
        state = self._init_state(b_size=len(seq))

        y = []
        for i in range(len(seq)):
            y.append(self.resnet(x[i]))

        y = torch.nn.utils.rnn.pad_sequence(y)
        pack = torch.nn.utils.rnn.pack_padded_sequence(y, seq, batch_first=False)
        z, _ = self.lstm(pack, state)
        z = nn.utils.rnn.pad_packed_sequence(z, batch_first=False)

        t = []
        for i in range(len(seq)):
            t.append( z[0][seq[i]-1,i,:] )

        t = torch.stack(t,0)
        out = self.classifier(t)
        out = self.out(out)

        return out

I attach also the modified collate_fn function of the dataloader

def my_collate(batch):
    frames, l = zip(*batch)
    lengths = []
    for i in range(len(frames)):
        lengths.append(frames[i].shape[0])

    perm_idx = sorted(range(len(lengths)), key=lengths.__getitem__, reverse=True)

    frames_out = [frames[i] for i in perm_idx]
    l_out = [l[i] for i in perm_idx]
    lengths_out = [lengths[i] for i in perm_idx]


    return frames_out, torch.LongTensor(l_out), lengths_out

Do you think there is a bug in my implementation? Thanks