Hi, I’ve read and searched and read some more on the forum, but I can’t understand the following:
how do I calculate and set the network’s input size, and what is its relation to image size?
I have an AlexNet clone (single channel 224 x 224) which I want to now use with a single channel 48 x 48 greyscale image:
class alexnet_custom(nn.Module):
def __init__(self, num_classes=2):
super(alexnet_custom, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
Is there some clear and concise way to understand the relation between the tensor input and the image size, and how that is related to channels, height, width (and batch size)?