Given input size of (5 x 2) how to get feedforward network to output size (1 x 20)?

I’m trying to create a model that takes a matrix of 5 x 2 (5 samples of 2 channel audio) and outputs something that is 1 x 20. At present this isn’t happening and it’s outputting 5 x 20, and I think/know this is down to how matrix multiplication works - but I was wondering how do I get it outputting my desired output shape. Below is some pseudo-code for my model.

class FFN(nn.Module):
   def __init__(self):
       super(FFN, self).__init__()
       self.hidden1 = nn.Linear(2, 128)
       self.hidden2 = nn.Linear(128, 128)
       self.output = nn.Linear(128, 20)

   def forward(self, x):
       output = self.output(F.relu(self.hidden2(F.relu(self.hidden1(x)))))
       return output  

So I think the reason it’s giving me a 5 x 20 output is that the matrix multiplication is working as below.
A = 5 x 2
B = 2 x 128
C = 128 x 20

AB = 5 x 128
(AB)C = 5 x 20.

If you just want the shape to (1x20), you could just do return output.mean(dim=0). It should be noted that taking the mean with make your network dependent on the number of samples!

But shouldn’t the network take in N samples and return N outputs? Then your loss function with sum over the number of samples and return a single value?

The reason why your network returns 5x20 as the size is due to matrix multiplication. For example with A times B the shapes must be ‘compatible’. So, the number of number of columns in A must match the number of rows in B, (the inner two dimensions in your example, [5,2] x [2,128]), and the remaining dimensions give the output shape from the multiplication (which is a matrix of shape [5,128]).

Thanks for the reply.

In this instance, I want to predict the nth sample but using samples n-2:n+2, so that’s why I want it to take in 5 x 2 but output 1 x 20. Taking the mean might actually be useful as it would act as a moving average filter - and I’m not sure why I didn’t think of that to begin with.

But wouldn’t that include the nth sample in the input, so the network would just learn to disregard the other 4 samples and return the central sample?

Apologies, I’m not sure I explained clearly enough what my model is intended for. I’m looking to make a transformation on the signal. So I’m not looking to predict the next value in a sequence.

My train of thought was that it would use the samples on either side to learn some kind of temporal relationship between the desired transformed output value at n and the samples on either side of its original form.

1 Like

No need to apologise! If you’re not predicting the next sample, but refining the current samples then this might work!