Based on your code the shape mismatch is created in the first linear layer.
Your flattened input tensor has a shape of [batch_size=64, features=38850] while the linear layer expects 259 features. I guess your input tensors might contain an additional dimension (maybe a temporal dimension as a sequence length) of 512 samples?
As far as I can see the input data has the shape “torch.Size([150, 259])”. The code is here:
class CustomMusicDataset(Dataset):
def __init__(self, maps, toread = None):
self.data = {}
self.len = {}
# Read all the maps
for music_map in maps:
# prepare the datafile
self.data[music_map] = []
self.len[music_map] = 0
# Read map and select some random elements
mapname = music_map + ".map"
with open(mapname) as filein:
for line in filein.readlines():
location, myhash, title = line.split(";")
print(f"Analyzing {location}")
# Read the actual spetral data
for idx in range(10):
specfilename = f"{location}/{myhash}_{idx}.pt"
if Path(specfilename).exists():
spec = torch.load(specfilename)
if spec.shape[1] > 259:
spec = spec[:,:259]
#print(f"appending spectra with shape {spec.shape}")
self.data[music_map].append(spec)
self.len[music_map] += 1
if toread is not None and self.len[music_map]>toread:
break
for music_map, n in self.len.items():
print(f"{music_map} has {n} entries.")
This is exactly the issue I’m trying to point out.
As you’ve confirmed, each sample has already two dimensions in the shape [150, 259]. A batch of these samples would thus have the shape [batch_size, 150, 259] and flattening the “feature dimensions” creates a tensor in the shape [batch_size, 38850] (as also seen in my code) which then causes the error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x38850 and 259x512)
since 295 features are expected, while you are passing 38850 features.
Check what the dimensions in your samples represent and what your actual use case is.
Currently you are using a linear layer, which expects 259 features, while your input samples contains 150*259 features, so what does the shape 150 represent?
Is it a temporal dimension? If so, would you like to use all time steps, only a the last one, or any other?
There isn’t a single way to solve the issue as it depends on your actual use case of course.
If you want to use all features and cannot figure out what these dimensions represent, set in_features of the first linear layer to 38850.
What do you mean by in_feature? I do not see that parameter in my code.
Could you please show exactly where and how I can change the code? In fact, the 150*259 is a time spectrogram of the input data which I all want to use.
Ah, when I set size_data = 259*150 that steps seems to work now!
But then when I want to use the network I get the error
size mismatch for linear_relu_stack.0.weight: copying a param with shape torch.Size([512, 38850]) from checkpoint, the shape in current model is torch.Size([512, 259]).
Your test data is missing the batch dimension and you are now trying to use the dimension with the size 150 as the batch dim, which causes the error.
Either use a DataLoader, which will add the missing batch dimension for you or add it manually via unsqueeze(0).
The batch dimension is a dimension in tensors representing the samples in the tensor.
E.g. a standard nn.Linear layer expects an input in the shape [batch_size, in_features].
I.e. an input in the shape [64, 38850] represents a tensor with 64 samples (data points) each containing 38850 features.
Your new error is raised, since your input tensor does not contain the batch dimension any more and you are trying to use the tensor in the shape [150, 259] directly while 259*150 should be the flattened feature dimension.
Use x = x.unsqueeze(0) before passing the input to the model, make sure it has the shape [batch_size=1, 150, 259] and it should work as before.
that seems to “work” now, I do not get any error! Maybe the tutorial got that bit wrong?
But as a result I now get a “tensor”, especially something like
tensor([[ 0.0000, 11.3125, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000]], grad_fn=<ReluBackward0>)
Is there some tutorial/page on how to interpret that? Or I am doing still something wrong? I thought the result was be a single number?
Yes, the output represents clipped logits in the shape [batch_size, 10] where each entry in dim1 represents the logit for the corresponding class index (out of 10 classes).
I would recommend to remove the last nn.ReLU from your model to allow for negative and positive logits.
I guess the best course of action would be to redo a tutorial.
Can you suggest a SIMPLE , REALLY SIMPLE tutorial to learn how to TRAIN a network and to USE a network, as SIMPLE as possible? I have two datasets: small snippets of music WITH and WITHOUT singing and I want to train a network to distinguish between music with and without singing?
Is there a tutorial which stick to the VERY essential of creating and using a network, skipping the billion options there are?
I would recommend to check this tutorial which explains how a neural network can be defined for the MNIST dataset and you could also play around with this dataset first to get familiar with the dimensions of the input tensors etc. as I believe the tensor shapes and dimensions are what’s causing the trouble right now.
Also, this tutorial might give you more insight into the model definition and training.
Generally, add print statements showing the shape of tensors (inside the model.forward method as well as “outside”) to get a better idea what the batch dimension is, how the DataLoader is creating it etc.