RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x1 and 16x32)

I am aware that some others got a similar RuntimeError message, but unfortunately I was not able to fix it yet. Maybe someone else has a hint ?

I want to feed an input feature vector containing 16 features to the network, pass it through some hidden layers (just one for now) and finally get an output in one hot vector format.

I noticed that the variable data in the training step has a length of one while it should be 16. Trying to pass data[0] which has the desired length also doesnt work. Where is the problem. Thanks in advance

import os
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader, TensorDataset
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch import Tensor
from natsort import os_sorted

# Device configuration
device = torch.device('cpu')

# Hyper-parameters 
input_size = 16
hidden_size = 32
num_classes = 2
num_epochs = 5
batch_size = 1
learning_rate = 0.001
Train_Val_Pct = 0.1 #distribution of learning and testing data in percent

class Features(Dataset):
    good_data = 'Preprocessed_Data/Balanced_Data/Good'
    bad_data = 'Preprocessed_Data/Balanced_Data/Bad'
    labels = {good_data: 0, bad_data: 1}
    samples = []
    
    goodcount = 0
    badcount = 0
    
    def __init__(self):
        for label in self.labels:
            print(label)
            for run in os_sorted(os.listdir(label)): #os_sorted needed to go through data as it is listed by windows (14,101,165,...) otherwise (101,14,165,...)
                path = os.path.join(label, run)
                print(path)
                data = pd.read_csv(path,header = None)
                self.samples.append([np.array(data), np.eye(2)[self.labels[label]]])
                
                if label == self.good_data:
                    self.goodcount += 1
                elif label == self.bad_data:
                    self.badcount += 1
        #np.random.shuffle(self.samples)
        np.save("samples.npy", self.samples)        
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        return self.samples[idx]
    
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu1 = nn.ReLU() #activation function
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size, num_classes)

    def forward(self, x): #x is a batch of inputs
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.fc3(out)
        return F.softmax(x, dim=1)  #out is the number of classes
    
net = Net(input_size, hidden_size, num_classes).to(device)

samples = np.load('samples.npy',allow_pickle=True)

X = torch.Tensor([i[0] for i in samples])
y = torch.Tensor([i[1] for i in samples])

size = int(Train_Val_Pct*len(X))

train_X = X[:-size]
train_y = y[:-size]

test_X = X[-size:]
test_y = y[-size:]

trainset = TensorDataset(Tensor(train_X), Tensor(train_y))
testset = TensorDataset(Tensor(test_X), Tensor(test_y))

trainloader = DataLoader(trainset,batch_size=batch_size,shuffle=True)
testloader = DataLoader(testset,batch_size=batch_size,shuffle=False)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)  

#Train the model
total_step = len(trainloader)
for epoch in range(num_epochs):
    for i, (data, labels) in enumerate(trainloader):  
        # Move tensors to the configured device
        data = data[0].to(device)
        labels = labels[0].to(device)
        
        # Forward pass
        outputs = net(data)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(loss)
    

# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for data, labels in testloader:
        data = data.to(device)
        labels = labels.to(device)
        outputs = net(data)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print("Accuracy:", round(correct/total,3))

# Save the model checkpoint
torch.save(net.state_dict(), 'net.ckpt')

In your current forward implementation you are passing x to F.softmax so that the actual model will not be used.
Also, nn.CrossEntropyLoss expects raw logits, as internally F.log_softmax will be applied, so remove the F.softmax and return out directly.

If you are still facing a shape mismatch error, could you print the shape of the data tensor before passing it to the model?

Alright, totally didnt see that, thanks a lot.

I changed some other details and got it running now, but the acccuracy of the results is varying a lot. Is this due to an insufficient amount of training data (144 sets for training, 16 for testing) or is there some other error in my code I am not seeing ?

import os
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader, TensorDataset
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch import Tensor
from natsort import os_sorted

# Device configuration
device = torch.device('cpu')

# Hyper-parameters 
input_size = 16
hidden_size = 32
num_classes = 2
num_epochs = 10
batch_size = 1
learning_rate = 0.001
Train_Val_Pct = 0.1 #distribution of learning and testing data in percent

class Features(Dataset):
    good_data = 'Preprocessed_Data/Balanced_Data/Good'
    bad_data = 'Preprocessed_Data/Balanced_Data/Bad'
    labels = {good_data: 0, bad_data: 1}
    samples = []
    
    goodcount = 0
    badcount = 0
    
    def __init__(self):
        for label in self.labels:
            print(label)
            for run in os_sorted(os.listdir(label)): #os_sorted needed to go through data as it is listed by windows (14,101,165,...) otherwise (101,14,165,...)
                path = os.path.join(label, run)
                print(path)
                data = pd.read_csv(path,header = None)
                self.samples.append(np.array(data), np.eye(2)[self.labels[label]])
                
                if label == self.good_data:
                    self.goodcount += 1
                elif label == self.bad_data:
                    self.badcount += 1
        #np.random.shuffle(self.samples)
        np.save("samples.npy", self.samples)        
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        return self.samples[idx]
    
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu1 = nn.ReLU() #activation function
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.relu3 = nn.ReLU() #activation function
        self.fc4 = nn.Linear(hidden_size, hidden_size)
        self.relu4 = nn.ReLU() #activation function
        self.fc5 = nn.Linear(hidden_size, num_classes)

    def forward(self, x): #x is a batch of inputs
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.fc3(out)
        out = self.relu3(out)
        out = self.fc4(out)
        out = self.relu4(out)
        out = self.fc5(out)     #out is the number of classes
        return out
        #return F.softmax(out, dim=1)  
    
net = Net(input_size, hidden_size, num_classes).to(device)

samples = np.load('samples.npy',allow_pickle=True)

X = torch.Tensor([i[0] for i in samples])
y = torch.Tensor([i[1] for i in samples])

size = int(Train_Val_Pct*len(X))

train_X = X[:-size].squeeze(-1)
train_y = y[:-size].squeeze(-1)

test_X = X[-size:].squeeze(-1)
test_y = y[-size:].squeeze(-1)

trainset = TensorDataset(Tensor(train_X), Tensor(train_y))
testset = TensorDataset(Tensor(test_X), Tensor(test_y))

trainloader = DataLoader(trainset,batch_size=batch_size,shuffle=True)
testloader = DataLoader(testset,batch_size=batch_size,shuffle=False)

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)  
#Train the model
total_step = len(trainloader)
for epoch in range(num_epochs):
    for i, (data, labels) in enumerate(trainloader):  
        # Move tensors to the configured device
        data = data.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = net(data)
        #print('Outputs Train: ',outputs)
        loss = criterion(outputs, labels)
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(loss)
    

# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for data, labels in testloader:
        data = data.to(device)
        labels = torch.argmax(labels).to(device)  #real class
        #real = torch.max(labels).to(device)
        outputs = torch.argmax(net(data))
        #print('Outputs Test: ',outputs)
        #print('Real output: ',real)
        #print('Labels: ',labels)
        predicted = outputs
        if torch.eq(predicted, labels):
            correct += 1
        total += 1

    print("Accuracy:", round(correct/total,3))

# Save the model checkpoint
torch.save(net.state_dict(), 'net.ckpt')

The high variance in the loss and/or accuracy is most likely due to the small datasets.
144 samples is a tiny dataset and I’m skeptical, if you would be able to train a neural network with it at all, such that it can generalize to new, unseen data.

Thanks for your reply :slight_smile:
Alright, yeah I was afraid that that could be the case. What would be an appropriate amout of training data to get a halfway decent result when testing ?

I got 560 samples of accelerometer readings each containing 133328 datapoints. From that I extracted 16 features for each sample. Unfortunately the data is highly imbalanced where 6/7 is data where the machine is in bad condition and 1/7 where the machine is in good condition.

Up to now I went with undersampling and threw away 5/7 of the “bad” data.

Right now I try to split each samples in 7 subsamples and get the features of that. I would then throw away the 6/7 of the bad data again but would end up with equal amounts of good and bad data (560 samples of good, 560 samples of bad), which are 1/7 of the original samples length. For training I would then have 504 dataset and 56 for testing. Is that sufficient, or do you have other suggestions how to “generate” more input data ? Hope you can follow me, I am not a native speaker. Thanks in advance :slight_smile:

It’s a bit hard to tell, what the min. amount of data would be, but just for comparison: MNIST contains 60k training samples and is considered small.
That being said, you should try out your current workflow and check, if the results look somewhat reasonable.
PS: usually you would also use a test dataset, which should be used once your training and validation is finished to get an estimate of the performance on unseen data.

Just out of curiosity, it seems that also the “quality” of the dataset has an influence on the results. So for my example rather than just using the raw sensor readings as inputs for the NN, I went with extracting features of the different runs that might describe the condition of the gearbox more accurately and used this as an input vector.

With the “new data”, as mentioned in the last post, I end up with a rather consistent accuracy of about 85%. Also not great yet, but I think its going in the right direction.

The dataset I used is the PHM Society, Gearbox fault detection data set, 2010. This was used for an international challenge where with that dataset the general health and specific faults of the gearbox for each of the 560 runs should be classified.

Yes, feature engineering helps in a lot of use cases and is a common way to improve the performance of an ML model. Often it’s expected that “deep” models are able to extract useful features automatically, so less effort is spent on the manual feature engineering.
However, if you are working on “smaller” models you would often still benefit from it.

E.g. a very simple (and artificial) example would be the dataset in Figure on the top left from these scikit-learn docs. Assume you have these two “rings” as two separate classes. Passing the features as e.g. the angle of each point wouldn’t work in a simple classifier without transforming the input space. However, if you pass the magnitude (norm) of each point to the classifier, it would most likely be able to yield good results.