I have two question regarding the structure of the AlexNet:
I opened a question on SO about it because if you read the specification on the paper and then the diagram, the dimensions of the volumes on the networks does not match. I assume I am missing something but it just does not work in this publication (which is amazing, don’t get me wrong)
I took a look at how PyTorch implemented it with the hope to find out what was I missing, and I just see the implementation is totally different from the original paper. For instance, the first layer in the paper is a 11x11 convolution with a stride of 4 and 96 kernels, whereas the PyTorch implementation has 64. I think this could be because for a single GPU it may not be just gathering the 96 kernels (48 each of the 2 defined in the paper), but where can I find those specifications then?
Thanks in advance!
def alexnet(pretrained=False, **kwargs):
r"""AlexNet model architecture from the
`"One weird trick..." <https://arxiv.org/abs/1404.5997>`_ paper.
pretrained (bool): If True, returns a model pre-trained on ImageNet
Answered on SO, but re-posting here (since this is the first result that comes up when I looked up the same question):
I found a comment in a GitHub issue that explained this well:
current implementation comes from the torch re-implementation, which is based on the single GPU implementation from Alex which uses 256 filters, while the 2GPU implementation from Krizhevsky uses 384 filters.
The original AlexNet model needed to be trained across 2 GTX 580 3GB GPUs because the model was too large to be trained on a single GPU (at the time). Krizhevsky (the author of the paper) released a single GPU implementation with different dimensions and the PyTorch implementation comes from there.