Confusing part in torch.nn.conv2d

Hello guys,

I am now building a feature extraction model using convolutional neural network for the OCR system. While building convolution layers in the model, there is one thing that I don’t understand. As far as my knowledge goes, the output size of the tensor matrix from torch.nn.conv2d is calculated using this equation : (height/width - kernel size) / stride + 1.
input = torch.zeros((1,3,32,320))
self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2 , bias=False)
If I feed the input matrix into above convolution layer, then the calculation would be
new_height = (320 - 3)/2 +1 : 159.5
new_width = (32 - 3)2 + 1 : 15.5
Since the new height and width are float values, the convolution shouldn’t work but still it returns new height and width values: torch.Size([1, 64, 13, 157])
Can you please explain why it returns such values?

Here is the code:

import torch
import torchvision
from torch import nn
from config import nh

class Net(nn.Module):
    def __init__(self, in_channels):
        super(Net, self).__init__()
        self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2 , bias=False)
        # self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

    def forward(self, x):
        x = self.conv(x)
        # x = self.maxpool(x)
        return x

if __name__ == "__main__":
    device = torch.device('cpu')
    a = torch.zeros((1,3,32,320)).to(device)
    net = Net(3)
    b = net(a)

Htut Lynn Aung

Why does batch_size mismatch the input 1->2?
And I just have a try and it works well, the output.shape is [1, 64, 15, 159]

Hi @MariosOreo,

It is a mistake from my part. Yes, the batch_size is 1 in this place. I will correct the details. But still the output in my side is also torch.Size([1, 64, 15, 159]) which is different from the results gained from the equations. It is not supposed to output this results right? or am I missing something in this equation? If my knowledge goes correctly, subtraction of kernel size from height or width of the image must be divisible by the stride but in this example, it is not divisible. I did not do any padding here too. That is where I am confused.

Htut Lynn Aung

1 Like