I’m trying to create a model that takes a matrix of 5 x 2 (5 samples of 2 channel audio) and outputs something that is 1 x 20. At present this isn’t happening and it’s outputting 5 x 20, and I think/know this is down to how matrix multiplication works - but I was wondering how do I get it outputting my desired output shape. Below is some pseudo-code for my model.
self.hidden1 = nn.Linear(2, 128)
self.hidden2 = nn.Linear(128, 128)
self.output = nn.Linear(128, 20)
def forward(self, x):
output = self.output(F.relu(self.hidden2(F.relu(self.hidden1(x)))))
So I think the reason it’s giving me a 5 x 20 output is that the matrix multiplication is working as below.
A = 5 x 2
B = 2 x 128
C = 128 x 20
AB = 5 x 128
(AB)C = 5 x 20.
If you just want the shape to (1x20), you could just do
return output.mean(dim=0). It should be noted that taking the mean with make your network dependent on the number of samples!
But shouldn’t the network take in
N samples and return
N outputs? Then your loss function with sum over the number of samples and return a single value?
The reason why your network returns 5x20 as the size is due to matrix multiplication. For example with
B the shapes must be ‘compatible’. So, the number of number of columns in
A must match the number of rows in
B, (the inner two dimensions in your example, [5,2] x [2,128]), and the remaining dimensions give the output shape from the multiplication (which is a matrix of shape [5,128]).
Thanks for the reply.
In this instance, I want to predict the nth sample but using samples
n-2:n+2, so that’s why I want it to take in 5 x 2 but output 1 x 20. Taking the mean might actually be useful as it would act as a moving average filter - and I’m not sure why I didn’t think of that to begin with.
But wouldn’t that include the
nth sample in the input, so the network would just learn to disregard the other 4 samples and return the central sample?
Apologies, I’m not sure I explained clearly enough what my model is intended for. I’m looking to make a transformation on the signal. So I’m not looking to predict the next value in a sequence.
My train of thought was that it would use the samples on either side to learn some kind of temporal relationship between the desired transformed output value at
n and the samples on either side of its original form.
No need to apologise! If you’re not predicting the next sample, but refining the current samples then this might work!