Hi! I used to be a Keras user, I want to port my functions to PyTorch. Recently I work on a video classification problem, which uses a similar architecture as LRCN (http://jeffdonahue.com/lrcn/), which applys CNN to extract features from each frame, then use LSTM for classification. In Keras, there is a timedistributed function (https://keras.io/layers/wrappers/) which can apply a layer to each temporal slice, I wonder PyTorch has similar implementations or how I can achieve similar function in this case? Any existing PyTorch example for it?
I developed a PyTorch module that mimics the TimeDistributed wrapper of Keras a few days ago:
import torch.nn as nn
def __init__(self, module, batch_first=False):
self.module = module
self.batch_first = batch_first
def forward(self, x):
if len(x.size()) <= 2:
# Squash samples and timesteps into a single axis
x_reshape = x.contiguous().view(-1, x.size(-1)) # (samples * timesteps, input_size)
y = self.module(x_reshape)
# We have to reshape Y
y = y.contiguous().view(x.size(0), -1, y.size(-1)) # (samples, timesteps, output_size)
y = y.view(-1, x.size(1), y.size(-1)) # (timesteps, samples, output_size)
@miguelvr Isn’t this still useful for other layers than Linear though? For example, the input tensor is of shape [sample, frame, image], like video, and you may want to apply a convnet module for each time frame. Please kindly correct me if I get this wrong.
We have been using Time distributed layer that is developed by you.
I declared the Time distributed layer as follows :
1. Declared linear layer then give that output to the time distributed layer in the module
# 1D CovNet for learning the Spectral features
self.conv1 = nn.Conv1d(in_channels=1, out_channels=128, kernel_size=(32,))
self.bn1 = nn.BatchNorm1d(128)
self.maxpool1 = nn.MaxPool1d(kernel_size=1, stride=97)
self.dropout1 = nn.Dropout(0.3)
# 1D LSTM for learning the temporal aggregation
self.lstm = nn.LSTM(input_size=128, hidden_size=128, num_layers=2, dropout=0.3)
# Fully Connected layer #self.fc3 = nn.Linear(128, 128) #self.bn3 = nn.BatchNorm1d(128)
# Get posterior probability for target event class
self.fc4 = nn.Linear(128, 1)
self.timedist = TimeDistributed(self.fc4)
But my doubt is When I the print the weight parameters of NN.
Time Distributor layer prints two times as follows
fc4.weight torch.Size([1, 128])
timedist.module.weight torch.Size([1, 128])
is it correct or any mistakes in the implementation.
Can you provide a small working example where this works? I have an input of the shape (samples, timesteps, channels, width, height). With your code, it combines all the dimensions except the last one which becomes input size as per your x_reshape. Then, it doesn’t work with any of the layers, giving a size mismatch error.
Thanks a lot for your nice explanation. I have a novice confusion: as batch samples and timesteps are squashed, won’t it have any problem in LSTM sequential learning? i.e when the sequence is reshaped to (samples, timesteps, output_size), will it retain the sequential (timesteps) features ordering for each sample as it was before squashing?
Did you resolve about the structure of your network on PyTorch? I am facing exactly the same problem and I am wondering if you can share the code of the network. I have to develop a CNN+LSTM network for video sequence classification.