How to pass input to 3DCNN GRU architecture

talhaanwarch · October 31, 2022, 8:18am

I want to pass video data to 3DCNN-GRU architecture.
This is shape of data
Batch,Channels,Num_Frames, Height, Width
I split the num_frames to windows
Now the shape is
Batch,Channels,Windows,Length, Height, Width
After that i reshape the data
Batch,Windows,Channels,Length, Height, Width
Now i combine first two axis and pass it to CNN
Batch*Windows,Channels,Length, Height, Width
The 3D CNN outputs
Batch*Windows,Features
I reshape the data again
Batch,Windows,Features
Permute the the shape before passing it to GRU, the new shape is
Batch,Features,Windows
the GRU output shape is
Batch,Features,Windows
the GRU hidden state shape is
1,Batch,Windows
This is GRU layer
torch.nn.GRU(input_size=8,hidden_size=8,batch_first=True)

Can any one help me is this flow is correct or not.