I’m working on designing a neural network in PyTorch that takes a matrix of sequential data with shape (N, L)
(where N
is the number of samples and L
is the number of elements in each sequence) and predicts a matrix of shape (K, L)
, where K
is a fixed value and different from N
.
The challenge I’m facing is that the output matrix is not a per-sample prediction but rather a single matrix for the entire batch. Additionally, the model should handle different values of N
during training and testing. For example, I might use a batch size of 128 during training but have an input X
with 1000 samples during testing.
Here’s a simplified version of my initial approach using an LSTM model:
class MyModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(MyModel, self).__init__()
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
def forward(self, x):
# x: (N, L)
lstm_out, _ = self.lstm(x) # lstm_out: (N, L, hidden_dim)
# How to reshape or process lstm_out to get (K, L)?
return output
However, this model’s output shape depends on the input batch size, which is not what I need. I don’t want to perform reduction operations like mean
or max
on the batch dimension.
How can I structure the model to extract and compress the information along the batch dimension into a single matrix output without explicitly reducing the size using operations like mean
or max
?
Thank you for your help!