I’m trying to create a model that takes a matrix of 5 x 2 (5 samples of 2 channel audio) and outputs something that is 1 x 20. At present this isn’t happening and it’s outputting 5 x 20, and I think/know this is down to how matrix multiplication works - but I was wondering how do I get it outputting my desired output shape. Below is some pseudo-code for my model.

```
class FFN(nn.Module):
def __init__(self):
super(FFN, self).__init__()
self.hidden1 = nn.Linear(2, 128)
self.hidden2 = nn.Linear(128, 128)
self.output = nn.Linear(128, 20)
def forward(self, x):
output = self.output(F.relu(self.hidden2(F.relu(self.hidden1(x)))))
return output
```

So I think the reason it’s giving me a 5 x 20 output is that the matrix multiplication is working as below.

A = 5 x 2

B = 2 x 128

C = 128 x 20

AB = 5 x 128

(AB)C = 5 x 20.

If you just want the shape to (1x20), you could just do `return output.mean(dim=0)`

. It should be noted that taking the mean with make your network dependent on the number of samples!

But shouldn’t the network take in `N`

samples and return `N`

outputs? Then your loss function with sum over the number of samples and return a single value?

The reason why your network returns 5x20 as the size is due to matrix multiplication. For example with `A`

times `B`

the shapes must be ‘compatible’. So, the number of number of columns in `A`

must match the number of rows in `B`

, (the inner two dimensions in your example, [5,2] x [2,128]), and the remaining dimensions give the output shape from the multiplication (which is a matrix of shape [5,128]).

Thanks for the reply.

In this instance, I want to predict the nth sample but using samples `n-2:n+2`

, so that’s why I want it to take in 5 x 2 but output 1 x 20. Taking the mean might actually be useful as it would act as a moving average filter - and I’m not sure why I didn’t think of that to begin with.

But wouldn’t that include the `nth`

sample in the input, so the network would just learn to disregard the other 4 samples and return the central sample?

Apologies, I’m not sure I explained clearly enough what my model is intended for. I’m looking to make a transformation on the signal. So I’m not looking to predict the next value in a sequence.

My train of thought was that it would use the samples on either side to learn some kind of temporal relationship between the desired transformed output value at `n`

and the samples on either side of its original form.

1 Like

No need to apologise! If you’re not predicting the next sample, but refining the current samples then this might work!