I am now building a feature extraction model using convolutional neural network for the OCR system. While building convolution layers in the model, there is one thing that I don’t understand. As far as my knowledge goes, the output size of the tensor matrix from torch.nn.conv2d is calculated using this equation : (height/width - kernel size) / stride + 1.
input = torch.zeros((1,3,32,320))
self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2 , bias=False)
If I feed the input matrix into above convolution layer, then the calculation would be
new_height = (320 - 3)/2 +1 : 159.5
new_width = (32 - 3)2 + 1 : 15.5
Since the new height and width are float values, the convolution shouldn’t work but still it returns new height and width values: torch.Size([1, 64, 13, 157])
Can you please explain why it returns such values?
Here is the code:
import torch import torchvision from torch import nn from config import nh class Net(nn.Module): def __init__(self, in_channels): super(Net, self).__init__() self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2 , bias=False) # self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) def forward(self, x): x = self.conv(x) # x = self.maxpool(x) return x if __name__ == "__main__": device = torch.device('cpu') a = torch.zeros((1,3,32,320)).to(device) net = Net(3) net.eval() b = net(a) print(b.size())
Htut Lynn Aung