Hello guys,
I am now building a feature extraction model using convolutional neural network for the OCR system. While building convolution layers in the model, there is one thing that I don’t understand. As far as my knowledge goes, the output size of the tensor matrix from torch.nn.conv2d is calculated using this equation : (height/width - kernel size) / stride + 1.
input = torch.zeros((1,3,32,320))
self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2 , bias=False)
If I feed the input matrix into above convolution layer, then the calculation would be
new_height = (320 - 3)/2 +1 : 159.5
new_width = (32 - 3)2 + 1 : 15.5
Since the new height and width are float values, the convolution shouldn’t work but still it returns new height and width values: torch.Size([1, 64, 13, 157])
Can you please explain why it returns such values?
Here is the code:
import torch
import torchvision
from torch import nn
from config import nh
class Net(nn.Module):
def __init__(self, in_channels):
super(Net, self).__init__()
self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2 , bias=False)
# self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
def forward(self, x):
x = self.conv(x)
# x = self.maxpool(x)
return x
if __name__ == "__main__":
device = torch.device('cpu')
a = torch.zeros((1,3,32,320)).to(device)
net = Net(3)
net.eval()
b = net(a)
print(b.size())
Sincerely,
Htut Lynn Aung