Can't design neural network to map mfcc to images

ai007 · March 20, 2022, 8:47pm

I am trying to map mfcc to images but I can’t find out what the correct parameters should be for my nn.Linear and Conv2d layers.

This is the error I get:
“RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x196608 and 25565904x26)”

The images and mfcc features which are in npy format are pre loaded in the facedataset. (Get item from the dataset class returns both image and audio tuple.
This is the code: (Can’t upload more than one image

ptrblck · March 22, 2022, 6:01am

This error is most likely raised in an nn.Linear layer, which expects the input to have 25565904 features while your current input has 196608 features.
Based on the screenshot it seems th be lin1 so you might want to remove the multiplication with 13.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier

ptrblck · March 22, 2022, 5:46pm

self.conv1 expects a 4-dimensional tensor in the shape [batch_size, channels, height, width] while your nn.Sequential container passes the 2-dmensional output of the previous linear layer to it.
Either create a custom Unflatten/Unsqueeze module and add it between these layers or split the modules into two parts and unsqueeze the activation manually in the forward method.

ptrblck · March 23, 2022, 6:01am

I’m not sure which dimensions you are unsqueezing now but based on the initial code snippet it seems you are adding the batch dimension, which sounds wrong.
Keep the batch size equal in the forward method to avoid these shape mismatches.

ptrblck · March 23, 2022, 5:39pm

You can write a custom View module, e.g. as:

class View(nn.Module):
    def __init__(self, size):
        super().__init__()
        self.size = size
        
    def forward(self, x):
        x = x.view(self.size)
        return x
    

model = nn.Sequential(
    nn.Linear(1, 2048),
    View(size=(-1, 128, 4, 4)),
    nn.ConvTranspose2d(128, 1, 1)
)

x = torch.randn(1, 1)
out = model(x)

ptrblck · March 23, 2022, 7:14pm

This error is raised as you are using OrderedDict wrong and the View module misses it’s key.

ptrblck · March 23, 2022, 9:03pm

You have forgotten to copy the forward method from my code snippet.

ptrblck · March 27, 2022, 8:46pm

I cannot reproduce the error and your code works fine for me:

model = NeuralNetwork()
x = torch.randn(1, 13)
out = model(x)
print(out.shape)
# torch.Size([1, 3, 766, 766])

Make sure the posted code can be executed with e.g. random data and in fact is able to reproduce the claimed error, please.

You are only seeing two print statements since you are overwriting the PrintSize modules with the same names as print1. Change the key and it would print the others, too.

ptrblck · April 1, 2022, 8:16pm

It seems the import is failing so make sure model is actually a script which can be imported.
If you get stuck, just copy/paste the model definition into your current script.

ptrblck · April 3, 2022, 2:21am

I don’t understand this statement so could you explain a bit more where you are printing the shape of pred, what it shows, and where it seems to change?

Based on the error message it seems you are passing an unsupported input to cv2.imwrite, so you would need to at least call pred.numpy() once the shape issue is resolved.

ptrblck · April 3, 2022, 9:27pm

Did you try to use the suggested code, i.e. detaching the tensor and pushing it to the CPU?

By default PyTorch expects an input of [batch_size, channels, height, width] for *2d layers. As I don’t see any permute call in your code I would thus expect that the output is still in this shape.