CNN+LSTM for Video Classification

vdw · July 29, 2023, 12:19pm

Is this your implementation? It looks a bit odd to me, particularly since

out, hidden = self.lstm(x.unsqueeze(0))

is called within the loop, seemingly for a single frame (instead of a sequence of frames).

I can’t be sure however, since I don’t know the shape and nature of x_3d. Right now my guts say the code is off :). In general, CNN+LSTM is a common architecture, though.