Error "mat1 and mat2 shapes cannot be multiplied"

Hello, new to pytorch I am trying to create a model using the script found under Dec 01 9:08 AM - Codeshare

but when I run this code I get the error

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x38850 and 259x512)

I am not sure where the 38850 comes from (which is 150 * 259). Any ideas?

Based on your code the shape mismatch is created in the first linear layer.
Your flattened input tensor has a shape of [batch_size=64, features=38850] while the linear layer expects 259 features. I guess your input tensors might contain an additional dimension (maybe a temporal dimension as a sequence length) of 512 samples?

As far as I can see the input data has the shape “torch.Size([150, 259])”. The code is here:

class CustomMusicDataset(Dataset):
    def __init__(self, maps, toread = None):
           
        self.data = {}
        self.len = {}
    
        # Read all the maps
        for music_map in maps:
            # prepare the datafile
            self.data[music_map] = []
            self.len[music_map] = 0

            # Read map and select some random elements
            mapname = music_map + ".map"
            with open(mapname) as filein:
                for line in filein.readlines():                                        
                    location, myhash, title = line.split(";")
                    print(f"Analyzing {location}")

                    # Read the actual spetral data
                    for idx in range(10):
                        specfilename = f"{location}/{myhash}_{idx}.pt"
                        if Path(specfilename).exists():
                            spec = torch.load(specfilename)
                            if spec.shape[1] > 259:
                                spec = spec[:,:259]

                            #print(f"appending spectra with shape {spec.shape}")
                            self.data[music_map].append(spec)
                            self.len[music_map] += 1

                    if toread is not None and self.len[music_map]>toread:
                        break

        for music_map, n in self.len.items():
            print(f"{music_map} has {n} entries.")

That’s not necessarily the case, since you are only checking and slicing dim1 in:

if spec.shape[1] > 259:
    spec = spec[:,:259]

The tensor can have multiple dimensions as seen here:

spec = torch.randn(64, 500, 150)

if spec.shape[1] > 259:
    spec = spec[:,:259]

print(spec.shape)
# torch.Size([64, 259, 150])

print(spec.view(spec.size(0), -1).shape)
# torch.Size([64, 38850])

But when I check all the entries I see every single entry has shape [150, 259]!

There must be an error in the code itself that I shared yesterday.

This is exactly the issue I’m trying to point out.
As you’ve confirmed, each sample has already two dimensions in the shape [150, 259]. A batch of these samples would thus have the shape [batch_size, 150, 259] and flattening the “feature dimensions” creates a tensor in the shape [batch_size, 38850] (as also seen in my code) which then causes the error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x38850 and 259x512)

since 295 features are expected, while you are passing 38850 features.

But how do I fix this problem?

I did follow a tutorial for PyTorch to train a network, and this is the code I created.

So somewhere I seem to have made a mistake. But where, and how to fix it?

Check what the dimensions in your samples represent and what your actual use case is.
Currently you are using a linear layer, which expects 259 features, while your input samples contains 150*259 features, so what does the shape 150 represent?
Is it a temporal dimension? If so, would you like to use all time steps, only a the last one, or any other?
There isn’t a single way to solve the issue as it depends on your actual use case of course.
If you want to use all features and cannot figure out what these dimensions represent, set in_features of the first linear layer to 38850.

What do you mean by in_feature? I do not see that parameter in my code.

Could you please show exactly where and how I can change the code? In fact, the 150*259 is a time spectrogram of the input data which I all want to use.

Ah, when I set size_data = 259*150 that steps seems to work now!

But then when I want to use the network I get the error

size mismatch for linear_relu_stack.0.weight: copying a param with shape torch.Size([512, 38850]) from checkpoint, the shape in current model is torch.Size([512, 259]).

Here is the code to use the network:

test_data = dataset_music.CustomMusicDataset(["Instrumental", "Pop"], 100)

size_data = 259

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(size_data, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))
model.eval()

x = test_data.data["Instrumental"][0]
pred = model(x)
print(pred)

When I change the size_data to

size_data = 259 * 150

I get the matrix multiplication error again:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (150x259 and 38850x512)

So maybe I need to flatten something, or redo the model, or there is a bug, or PyTorch has a bug…?

I tried to flatten the test data and changed the code to

pred = model(torch.flatten(x))

but this gives another error:

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Your test data is missing the batch dimension and you are now trying to use the dimension with the size 150 as the batch dim, which causes the error.
Either use a DataLoader, which will add the missing batch dimension for you or add it manually via unsqueeze(0).

What is a batch dimension?

Anyway, I will create a github code example with the complete reproducible code, and I hope someone is able to help find the error…

The batch dimension is a dimension in tensors representing the samples in the tensor.
E.g. a standard nn.Linear layer expects an input in the shape [batch_size, in_features].
I.e. an input in the shape [64, 38850] represents a tensor with 64 samples (data points) each containing 38850 features.
Your new error is raised, since your input tensor does not contain the batch dimension any more and you are trying to use the tensor in the shape [150, 259] directly while 259*150 should be the flattened feature dimension.
Use x = x.unsqueeze(0) before passing the input to the model, make sure it has the shape [batch_size=1, 150, 259] and it should work as before.

Ok thanks,

that seems to “work” now, I do not get any error! Maybe the tutorial got that bit wrong?

But as a result I now get a “tensor”, especially something like

tensor([[ 0.0000, 11.3125,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000]], grad_fn=<ReluBackward0>)

Is there some tutorial/page on how to interpret that? Or I am doing still something wrong? I thought the result was be a single number?

Ah wait, it gives me 10 values as this is the size of my last layer, correct?

Yes, the output represents clipped logits in the shape [batch_size, 10] where each entry in dim1 represents the logit for the corresponding class index (out of 10 classes).
I would recommend to remove the last nn.ReLU from your model to allow for negative and positive logits.

I guess the best course of action would be to redo a tutorial.

Can you suggest a SIMPLE , REALLY SIMPLE tutorial to learn how to TRAIN a network and to USE a network, as SIMPLE as possible? I have two datasets: small snippets of music WITH and WITHOUT singing and I want to train a network to distinguish between music with and without singing?

Is there a tutorial which stick to the VERY essential of creating and using a network, skipping the billion options there are?

I would recommend to check this tutorial which explains how a neural network can be defined for the MNIST dataset and you could also play around with this dataset first to get familiar with the dimensions of the input tensors etc. as I believe the tensor shapes and dimensions are what’s causing the trouble right now.
Also, this tutorial might give you more insight into the model definition and training.
Generally, add print statements showing the shape of tensors (inside the model.forward method as well as “outside”) to get a better idea what the batch dimension is, how the DataLoader is creating it etc.