Given input size of (5 x 2) how to get feedforward network to output size (1 x 20)?

I’m trying to create a model that takes a matrix of 5 x 2 (5 samples of 2 channel audio) and outputs something that is 1 x 20. At present this isn’t happening and it’s outputting 5 x 20, and I think/know this is down to how matrix multiplication works - but I was wondering how do I get it outputting my desired output shape. Below is some pseudo-code for my model.

``````class FFN(nn.Module):
def __init__(self):
super(FFN, self).__init__()
self.hidden1 = nn.Linear(2, 128)
self.hidden2 = nn.Linear(128, 128)
self.output = nn.Linear(128, 20)

def forward(self, x):
output = self.output(F.relu(self.hidden2(F.relu(self.hidden1(x)))))
return output
``````

So I think the reason it’s giving me a 5 x 20 output is that the matrix multiplication is working as below.
A = 5 x 2
B = 2 x 128
C = 128 x 20

AB = 5 x 128
(AB)C = 5 x 20.

If you just want the shape to (1x20), you could just do `return output.mean(dim=0)`. It should be noted that taking the mean with make your network dependent on the number of samples!

But shouldn’t the network take in `N` samples and return `N` outputs? Then your loss function with sum over the number of samples and return a single value?

The reason why your network returns 5x20 as the size is due to matrix multiplication. For example with `A` times `B` the shapes must be ‘compatible’. So, the number of number of columns in `A` must match the number of rows in `B`, (the inner two dimensions in your example, [5,2] x [2,128]), and the remaining dimensions give the output shape from the multiplication (which is a matrix of shape [5,128]).

In this instance, I want to predict the nth sample but using samples `n-2:n+2`, so that’s why I want it to take in 5 x 2 but output 1 x 20. Taking the mean might actually be useful as it would act as a moving average filter - and I’m not sure why I didn’t think of that to begin with.
But wouldn’t that include the `nth` sample in the input, so the network would just learn to disregard the other 4 samples and return the central sample?
My train of thought was that it would use the samples on either side to learn some kind of temporal relationship between the desired transformed output value at `n` and the samples on either side of its original form.