Size mismatch error even after Flatteting in Conv1D

a_d · February 7, 2020, 10:05am

Hello, I am trying to build a convolution-based classifier for some time series data. while running the code I get an error saying -
RuntimeError: size mismatch, m1: [1 x 20592], m2: [20526 x 20658]
The model class is as follows -

class timeSeriesConv(nn.Module):
    def __init__(self, channels, seq_length, kernel_size=3, K=2):
        super(timeSeriesConv, self).__init__()
        self.channels = channels
        self.seq_length = seq_length
        self.kernel = kernel_size
        self.Conv1D = nn.Conv1d(in_channels=self.channels,
                                out_channels=self.channels,
                                kernel_size=self.kernel,
                                stride=1)
        self.criterion = nn.CrossEntropyLoss().cuda()


        self.depthwiseConv = nn.Conv1d(in_channels=self.channels,
                                       out_channels=K * self.channels,
                                       kernel_size=self.kernel,
                                       stride=1)
        self.fc1 = nn.Linear(in_features=K*self.channels * (self.seq_length - 2*(self.kernel-1)), out_features=(K*self.channels*self.seq_length))
        self.fc2 = nn.Linear(in_features=(K*self.channels*self.seq_length), out_features=4)

    def forward(self, X):
        out = nn.functional.elu(X)
        out = self.depthwiseConv(out)
        out = out.view(out.size(0), -1)
        out = nn.functional.elu(out)
        out = self.fc1(out)
        out = nn.functional.elu(out)
        return (self.fc2(out))

where channels = 22, seq_length = 313, K = 3 and kernel_size =2
input shape is [1, 22, 313]
Error is being thrown at the line

out = self.fc1(out)

Please tell me where I am going wrong.
TIA

ptrblck · February 8, 2020, 8:05am

Currently you are calculating the sequence length of the activation after self.depthwiseConv as self.seq_length - 2*(self.kernel-1) which is wrong, as the kernel will remove one signal value on each side.

in_features=K*self.channels * (self.seq_length - 2*(self.kernel//2))

should work.

a_d · February 8, 2020, 9:18am

Thanks for replying @ptrblck, I am getting the same error with the same mismatch sizes after trying out the solution.

ptrblck · February 8, 2020, 9:20am

This shouldn’t be the case, as you’ve changed the number of input features.
Is the error message exactly the same and are you sure you’ve updated the code (make sure to rerun the cell in case you are using a Jupyter notebook).

a_d · February 8, 2020, 9:53am

I did double-check and the error message is indeed the same with the same tensor sizes and thrown at the same line. The code is updated. Here is the error thrown

size mismatch, m1: [1 x 20592], m2: [20526 x 20658] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:290

a_d · February 8, 2020, 5:20pm

@ptrblck could you please explain where the matrix m1 [1x20592 ] is coming from. Because output after convolutions should be 322 311 = [1, 20526] after flattening.
I even tried it out on dummy data with the same shape and the output tensor is as expected

ptrblck · February 8, 2020, 8:47pm

The calculation is correct and my suggestion should fix this error:

class timeSeriesConv(nn.Module):
    def __init__(self, channels, seq_length, kernel_size=3, K=2):
        super(timeSeriesConv, self).__init__()
        self.channels = channels
        self.seq_length = seq_length
        self.kernel = kernel_size

        self.depthwiseConv = nn.Conv1d(in_channels=self.channels,
                                       out_channels=K * self.channels,
                                       kernel_size=self.kernel,
                                       stride=1)
        self.fc1 = nn.Linear(in_features=K*self.channels * (self.seq_length - 2*(self.kernel//2)), out_features=(K*self.channels*self.seq_length))
        self.fc2 = nn.Linear(in_features=(K*self.channels*self.seq_length), out_features=4)

    def forward(self, X):
        out = nn.functional.elu(X)
        out = self.depthwiseConv(out)
        out = out.view(out.size(0), -1)
        out = nn.functional.elu(out)
        out = self.fc1(out)
        out = nn.functional.elu(out)
        return (self.fc2(out))


x = torch.randn(1, 22, 313)
model = timeSeriesConv(22, 313, K = 3)
output = model(x)

a_d · February 9, 2020, 5:57am

Thank you @ptrblck that worked perfectly.
On an urrelated topic, the cross entropy loss is coming too high, it oscillated between 15-25. train and dataloaders are given below. Could you please help me out with it.

    def train_model(self, data, model, epochs):
        cudnn.benchmark = True
        model.train()
        model.cuda()
        optimizer = torch.optim.Adam(model.parameters())
        loss = []
        min_loss = 100 # a random high value so that model loss is 
                       #always less than this value
        for epoch in range(0, epochs+1):
            avg_loss = 0
            st = time.time()
            for _,(x, y) in enumerate(data):
                optimizer.zero_grad()
                x = x.reshape(1, 22, 313)
                x = x.float().cuda()
                y = y.long().cuda()
                out = model(x)
                print(out)
                loss = self.criterion(out, y)
                loss.backward()
                optimizer.step()
                avg_loss += loss.item()

            et = time.time()
            print('--------------------------------------------------------------')
            print('TIME')
            print(et-st)
            print('LOSS')
            print(avg_loss/len(data))
            loss.append(avg_loss/len(data))
            if (avg_loss/len(data) < min_loss):
                torch.save(model.state_dict(), 'Conv.pth')
                min_loss = avg_loss/len(data)
        plt.scatter(loss, numpy.arange(10))
        plt.show()
        print('--------------   DONE TRAINING  -----------------')

the dataloader

def dataloader_train(path):
    os.chdir(path)
    train_data = []
    Data = []
    Labels = []
    classes = {'769':int(0), '770':int(1), '771':int(2), '772':int(3)}
    folders = ['769', '770', '771', '772']
    for folder in folders:
        files = os.listdir(path + '/' + folder)
        os.chdir(path + '/' + folder)
        for file in files:
            data = numpy.load(file)
            data = numpy.transpose(data)
            #Data.append(data)
            #Labels.append((int(folder)))
            train_data.append([data, torch.tensor(data=(classes[folder]), dtype=torch.int64)])

    train_loader = torch.utils.data.DataLoader(train_data, batch_size=1, shuffle=True)
    return train_loader, train_data

The Least loss this model could obtain was 15 where as the training parameters are a lot(way more than a million). Any idea why?

ptrblck · February 9, 2020, 6:06am

Try to overfit a small data sample (e.g. just 10 samples) to check, if the overall training routine doesn’t have any bugs.
I’m not completely sure, how your data loading pipeline works, so this might also be a check for this part of the code.
Once your model overfits the small sample, try to scale it up.

a_d · February 9, 2020, 6:11am

Sure, I will try it out.
Thanks a lot

a_d · February 9, 2020, 3:49pm

@ptrblck Sorry for bothering so much. I changed my model to the following -

class timeSeriesConv(nn.Module):
    def __init__(self, channels, seq_length, kernel_size=3, k=2):
        super(timeSeriesConv, self).__init__()
        self.channels = channels
        self.seq_length = seq_length
        self.kernel = kernel_size
        self.criterion = nn.CrossEntropyLoss().cuda()
        self.conv1 = nn.Conv1d(in_channels=self.channels, out_channels=self.channels, kernel_size=self.kernel, stride=1)
        self.depthwiseConv = nn.Conv1d(in_channels=self.channels,
                                       out_channels=k * self.channels,
                                       kernel_size=self.kernel,
                                       stride=1)

        self.fc1 = nn.Linear(in_features=k * self.channels * 309,
                             out_features=(k * self.channels * self.seq_length))

        self.fc2 = nn.Linear(in_features=(k * self.channels * self.seq_length),
                             out_features=2 * k * self.channels * self.seq_length)

        self.fc3 = nn.Linear(in_features=2 *k* self.channels * self.seq_length,
                             out_features=2 *k* self.channels * self.seq_length)

        self.fc4 = nn.Linear(in_features=2 *k* self.channels * self.seq_length,
                             out_features=(k*self.channels * self.seq_length) // 2)

        self.fc5 = nn.Linear(in_features=(self.channels * self.seq_length) // 2,
                             out_features=(k*self.channels * self.seq_length) // 4)
        self.fc6 = nn.Linear(in_features=(self.channels * self.seq_length) // 4, out_features=4)

    def forward(self, X):
        out = self.conv1(X)
        out = nn.functional.relu(out)
        out = self.depthwiseConv(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = nn.functional.relu(out)
        nn.Dropout(0.5)
        out = self.fc2(out)
        out = nn.functional.relu(out)
        nn.Dropout(0.3)
        out = self.fc3(out)
        out = nn.functional.relu(out)
        out = self.fc4(out)
        out = nn.functional.relu(out)
        nn.Dropout(0.25)
        out =self.fc5(out)
        out = nn.functional.relu(out)
        return self.fc6(out)

whereas def train is -

    def train_model(self, data, model, epochs):
        cudnn.benchmark = True
        model.train()
        model.cuda()
        optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
        loss = []
        min_loss = 100
        for epoch in range(0, epochs + 1):
            avg_loss = 0
            st = time.time()
            for _, (x, y) in enumerate(data):
                optimizer.zero_grad()
                x = x.reshape(1, 22, 313)
                x = x.float().cuda()
                y = y.long().cuda()
                out = model(x)
                print(out)
                loss = self.criterion(out, y)
                loss.backward()
                optimizer.step()
                avg_loss += loss.item()

As you can see, everything has .cuda() following it so it should be loaded onto the GPU memory. But now after changing the architecture, system RAM gets filed up all the way upto swap and more. Since the dataset has not changed, it cannot be the issue as I ran the model before.
Could you help please ?

ptrblck · February 9, 2020, 8:48pm

How much RAM is used by this model and training loop?

Also, the nn.Dropout layers in the forward pass won’t be used, as you are not calling the module with the activation.

a_d · February 10, 2020, 1:10am

As soon as I hit run, after giving the output for one sample in the first epoch RAM gets filled till the optimizer.step( ) line.
I have 16gb + 2gb swap both get filled up. GPU is RTX 2070

ptrblck · February 10, 2020, 2:13am

The posted code creates a shape mismatch again using:

model = timeSeriesConv(22, 313, kernel_size=2, k=3).cuda()
x = torch.randn(1, 22, 313).cuda()
output = model(x)
> RuntimeError: size mismatch, m1: [1 x 20526], m2: [20394 x 20658]

a_d · February 10, 2020, 2:54am

There is some silly calculation mistake I am making in calculating the in_features of fc1 which I am not able to catch. The above code only works for odd kernel sizes so for now I just went ahead it with.