Unsure of how to fix dimensions for my CNN

I have been working on a project where I use CNNs to restore punctuation to text, but I am still a little bit inexperienced with PyTorch. I’m having some trouble figuring out how to arrange the dimensions for the network. I also don’t think I am using the most effective framework for this problem either, but that’s for another time. The error that I am getting is at the bottom. I am unsure of what is helpful to show and what is not, so I have shown what seemed important to me.

Here are the important parts of my code:

Model:

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 64, kernel_size=(1, 50))
        self.conv2 = nn.Conv2d(64, 128, (1, 50))
        self.conv3 = nn.Conv2d(128, 128, (1, 50))
        
        self.fc1 = nn.Linear(32000, 1000)
        self.fc2 = nn.Linear(1000, 250)
        self.fc3 = nn.Linear(250, 3)
        
        self.dropout = nn.Dropout(p=0.2)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x.view(-1, 32000)
        
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)
        return x
    
model = ConvNet()
print(model)

Output:

ConvNet(
  (conv1): Conv2d(1, 64, kernel_size=(1, 50), stride=(1, 1))
  (conv2): Conv2d(64, 128, kernel_size=(1, 50), stride=(1, 1))
  (conv3): Conv2d(128, 128, kernel_size=(1, 50), stride=(1, 1))
  (fc1): Linear(in_features=32000, out_features=1000, bias=True)
  (fc2): Linear(in_features=1000, out_features=250, bias=True)
  (fc3): Linear(in_features=250, out_features=3, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)

Training loader:

training_data= []

train_tensor_wv_np = training_tensor_wordvecs.numpy()
train_tensor_label_np = training_tensor_labels.numpy()


for i in range(len(training_tensor_wordvecs)):
    training_data.append([train_tensor_wv_np[i], train_tensor_label_np[i]])
    
trainloader = torch.utils.data.DataLoader(training_data, shuffle=False, batch_size=1)
i1, l1 = next(iter(trainloader))

print(i1.shape, l1.shape)

Output:

torch.Size([1, 5, 50]) torch.Size([1, 5])

Training loop:

epochs = 20

model.train()

for epoch in range(epochs):
    training_loss = 0.0
    
    for data, target in trainloader:
        optimizer.zero_grad()
        output=model(data)
        loss = criterion(ouput, target)
        loss.backward()
        optimizer.step()
        training_loss+= loss.tem()*data.size(0)
    
    training_loss = training_loss/len(trainloader.sampler)
    
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(
        epoch+1, 
        training_loss
        ))

Output:

RuntimeError                              Traceback (most recent call last)
<ipython-input-136-a20e1a302cfe> in <module>
      8     for data, target in trainloader:
      9         optimizer.zero_grad()
---> 10         output=model(data)
     11         loss = criterion(ouput, target)
     12         loss.backward()

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    538             result = self._slow_forward(*input, **kwargs)
    539         else:
--> 540             result = self.forward(*input, **kwargs)
    541         for hook in self._forward_hooks.values():
    542             hook_result = hook(self, input, result)

<ipython-input-133-ce6a4f5b562c> in forward(self, x)
     14 
     15     def forward(self, x):
---> 16         x = F.relu(self.conv1(x))
     17         x = F.relu(self.conv2(x))
     18         x = F.relu(self.conv3(x))

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    538             result = self._slow_forward(*input, **kwargs)
    539         else:
--> 540             result = self.forward(*input, **kwargs)
    541         for hook in self._forward_hooks.values():
    542             hook_result = hook(self, input, result)

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
    346 
    347     def forward(self, input):
--> 348         return self._conv_forward(input, self.weight)
    349 
    350 class Conv3d(_ConvNd):

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\conv.py in _conv_forward(self, input, weight)
    343                             _pair(0), self.dilation, self.groups)
    344         return F.conv2d(input, weight, self.bias, self.stride,
--> 345                         self.padding, self.dilation, self.groups)
    346 
    347     def forward(self, input):

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 1, 1, 50], but got 3-dimensional input of size [1, 5, 50] instead

I don’t really know how to fix the dimensionality error, and I’m generally unsure if my current approach will work. Any guidance would be extremely appreciated. I am still relatively inexperienced, so I apologize if I have not presented my issue in the most readable way.

nn.Conv2d layers expect a 4-dimensional input in the shape [batch_size, channels, height, width], while you are passing a 3-dimensional input in [1, 5, 50].

I would guess that one spatial or the channel dimension in missing in your input.
I’m not sure if dim1 (size 5) corresponds to the channels, as self.conv1 would then raise an error since it’s expecting a single input channel.

If your input is indeed in the shape [batch_size, channels, len], you should change in_channels=5 in the first conv layer and maybe use an nn.Conv1d, since you only have a temporal dimension in the input.

Unrelated to your error, but you would have to reassign the view to x via:

x = x.view(x.size(0), -1)

Note that I’ve slightly changed the view arguments, as my approach would keep the batch size and yield possible shape mismatch errors in the feature dimension, which would be easier to debug.

Thank you for responding!

I’m not sure if this explanation will make sense, but in the array [1, 5, 50], it represents 1 set of 5 words that were converted into their glove vectors. I’m not sure if my batch size would be considered to be 5 or 1. For my label array, [1, 5] represents one set of labels for 5 words.

Could I just change the array to [1, 1, 5, 50]?

I am a bit confused as to how I can approach this, and any further clarification would be extremely helpful!

I made the following changes to get rid of the dimensionality error:

epochs = 20

model.train()
model.double()

for epoch in range(epochs):
    training_loss = 0.0
    
    for data, target in trainloader:
        
        data = data.unsqueeze(0)
        print(data.shape)
        optimizer.zero_grad()
        output=model(data)
        loss = criterion(ouput, target)
        loss.backward()
        optimizer.step()
        training_loss+= loss.tem()*data.size(0)
    
    training_loss = training_loss/len(trainloader.sampler)
    
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(
        epoch+1, 
        training_loss
        ))

However, now I am getting the following issue:

torch.Size([1, 1, 5, 50])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-ce7692ed966f> in <module>
     12         print(data.shape)
     13         optimizer.zero_grad()
---> 14         output=model(data)
     15         loss = criterion(ouput, target)
     16         loss.backward()

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    538             result = self._slow_forward(*input, **kwargs)
    539         else:
--> 540             result = self.forward(*input, **kwargs)
    541         for hook in self._forward_hooks.values():
    542             hook_result = hook(self, input, result)

<ipython-input-24-ef6039735d06> in forward(self, x)
     15     def forward(self, x):
     16         x = F.relu(self.conv1(x))
---> 17         x = F.relu(self.conv2(x))
     18         x = F.relu(self.conv3(x))
     19         x.view(-1, 32000)

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    538             result = self._slow_forward(*input, **kwargs)
    539         else:
--> 540             result = self.forward(*input, **kwargs)
    541         for hook in self._forward_hooks.values():
    542             hook_result = hook(self, input, result)

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
    346 
    347     def forward(self, input):
--> 348         return self._conv_forward(input, self.weight)
    349 
    350 class Conv3d(_ConvNd):

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\conv.py in _conv_forward(self, input, weight)
    343                             _pair(0), self.dilation, self.groups)
    344         return F.conv2d(input, weight, self.bias, self.stride,
--> 345                         self.padding, self.dilation, self.groups)
    346 
    347     def forward(self, input):

RuntimeError: Calculated padded input size per channel: (5 x 1). Kernel size: (1 x 50). Kernel size can't be greater than actual input size

I’m a bit confused as to why this is happening, as I have the tensor in the format (batch size, channels, height, and width), and I want the kernel to go down row by row. I’m not sure if this is the best approach for my project, and any further insight would be much appreciated!

The error message points to an activation, which has a too small input shape.
conv2 and conv3 are using a kernel shape of [1, 50], which would mean that the activation input shape should be at least equal.
Note that the first conv will only create an output of [batch_size, 64, 5, 1], which is too small for the second conv:

x = torch.randn(1, 1, 5, 50)
conv = nn.Conv2d(1, 1, kernel_size=(1, 50), stride=(1, 1))
out = conv(x)
print(out.shape)
> torch.Size([1, 1, 5, 1])

Hi again, I see what you mean, and when I fixed that problem, new ones kept rising up. I think I am approaching the network in the wrong way to begin with. I know it’s a lot to ask but I’ve been trying to create my model based on this research paper. I have completed all of the text pre-processing, and at this stage, I think I’m making some mistakes with creating the network. I’m trying to create the CNN-2 model referenced in the paper, but I’m really confused as to how I can manage my dimensions.

Once again, here is the output for my trainloader, with the data on the left, and the label on the right. I don’t necessarily need help from start to finish, I just need help when it comes to figuring out how to actually run this through a CNN.

torch.Size([1, 5, 50]) torch.Size([1, 5])

For reference on what the dimensions are again, the 1 in [1, 5, 50] was supposed to be the channels, 5 signifies the number of words, and 50 is the length of the word vector. In [1, 5] , I have the 1 as the channel, and 5 represents a label for each word. The label’s value is either 0, 1, or 2, depending on what punctuation the word has after it.

I completely understand if you can’t help me out with this as it’s a lot, but I would be grateful for any insight or links to other resources to help me figure this problem out. Thank you once again for your continued support!

By skimming through the section for CNN-2 and based on your explanations, I guess the input shape should be [batch_size, 1, 5, 50], as the kernels seem to operate 2-dimensional.
Unfortunately, I cannot find any architecture description for CNN-2, but it seems the kernel size should be [2, 3] (based on Figure 3).

The authors mention a CAFFE implementation. Were you able to find it, as it would make it easier to recreate the model?

CAFFE appears to be a deep learning framework, but I’m not sure if it will be helpful for me.

In terms of the input shape:
I made it so the data input tensor is [5, 1, 1, 50], as I assumed that [batch_size, channels, height, width] was meant to be the input. However, this is giving me errors.

Updated network code:

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=5, out_channels=64, kernel_size=(1, 3), padding=0)
        self.conv2 = nn.Conv2d(64, 128, (1, 3), padding=0)
        self.conv3 = nn.Conv2d(128, 128, (1, 3), padding=0)
        
        self.fc1 = nn.Linear(5120, 2048)
        self.fc2 = nn.Linear(2048, 1024)
        self.fc3 = nn.Linear(1024, 3)
        
        self.pool = nn.MaxPool2d((2))
        
        self.dropout = nn.Dropout(p=0.2)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        print(x.shape)
        x = self.pool(x)
        print(x.shape)
        x = F.relu(self.conv2(x))
        print(x.shape)
        print(x.shape)
        x = F.relu(self.conv3(x))
        print(x.shape)
        
        
        x = x.reshape(1, x.size()[0]  * x.size()[1])
        x = x.squeeze()
        
        print(x.shape)
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)
        return x
    
model = ConvNet()
print(model)

Updated Training loop code:

epochs = 20

model.train()
model.double()

for epoch in range(epochs):
    training_loss = 0.0
    
    for data, target in trainloader:
        
        data = data.view(5, 1, 1, 50)
        target = target.view(5)
 

        print(data.shape)
        optimizer.zero_grad()
        output=model(data)
        
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        training_loss+= loss.tem()*data.size(0)
    
    training_loss = training_loss/len(trainloader.sampler)
    
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(
        epoch+1, 
        training_loss
        ))

Output:

torch.Size([5, 1, 1, 50])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-159-69a1dbe7db1c> in <module>
     15         print(data.shape)
     16         optimizer.zero_grad()
---> 17         output=model(data)
     18 
     19         loss = criterion(output, target)

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    538             result = self._slow_forward(*input, **kwargs)
    539         else:
--> 540             result = self.forward(*input, **kwargs)
    541         for hook in self._forward_hooks.values():
    542             hook_result = hook(self, input, result)

<ipython-input-158-31a2dc474011> in forward(self, x)
     16 
     17     def forward(self, x):
---> 18         x = F.relu(self.conv1(x))
     19         print(x.shape)
     20         x = self.pool(x)

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    538             result = self._slow_forward(*input, **kwargs)
    539         else:
--> 540             result = self.forward(*input, **kwargs)
    541         for hook in self._forward_hooks.values():
    542             hook_result = hook(self, input, result)

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
    346 
    347     def forward(self, input):
--> 348         return self._conv_forward(input, self.weight)
    349 
    350 class Conv3d(_ConvNd):

D:\Users\user\Anaconda3\lib\site-packages\torch\nn\modules\conv.py in _conv_forward(self, input, weight)
    343                             _pair(0), self.dilation, self.groups)
    344         return F.conv2d(input, weight, self.bias, self.stride,
--> 345                         self.padding, self.dilation, self.groups)
    346 
    347     def forward(self, input):

RuntimeError: Given groups=1, weight of size [64, 5, 1, 3], expected input[5, 1, 1, 50] to have 5 channels, but got 1 channels instead

I know that this architecture probably doesn’t make the most sense, but I feel like I’m a bit closer to figuring this problem out. Thank you once again!

Yes, CAFFE is a deprecated framework.
If you find the implementation, there should be a prototxt file defining the architecture.

Based on Figure 3, I still think that [batch_size, 1, 5, 50] might be the expected input shape.

Hi again, I think that you may be right, but at the moment I don’t have a deep enough understanding to make it work. Instead, I’m thinking of trying a DNN for the system instead, the architecture for which was laid out more clearly in the paper.

I just had one question about how the output of the DNN would work.

Based on the paper, that would change my input to a [250] sized tensor, as I’m concatenating each one. I’m most confused about how the output would work, and how I would go about it. Do I need my final output fully connected layer to be of size 3, 5, 15, or something else completely?

The reasoning behind 3 would be that I am getting a softmax probability of there being no punctuation after the word, a comma, or a period. However, this does not account for all 5 words in the sequence. If I was to have my output be 15, would that work? In that case, could I just reshape the output to get the softmax probabilities of each label option?

If my label options are 0, 1, and 2, each meaning a type of punctuation, how do I make sure the softmax probabilities are “in the right order”, if that’s how it works?

I really appreciate all your help!