For a particular application, I am porting the code from keras to pytorch. The input is of the size [ bs x timesteps x features ], lstm output is [ bs x time step x hidden ]. Now I want to reduce this to [ bs x time step x out_features](time distributed layer on keras)
Using linear,
nn.Linear(in_features=hidden, out_features=out_features)
Is this the right way to do this if I want to preserve time information or do I need to reshape the data using contiguous in any way to achieve it?
Any help appreciated.