I have two question regarding the structure of the AlexNet:

I opened a question on SO about it because if you read the specification on the paper and then the diagram, the dimensions of the volumes on the networks does not match. I assume I am missing something but it just does not work in this publication (which is amazing, don’t get me wrong)

I took a look at how PyTorch implemented it with the hope to find out what was I missing, and I just see the implementation is totally different from the original paper. For instance, the first layer in the paper is a 11x11 convolution with a stride of 4 and 96 kernels, whereas the PyTorch implementation has 64. I think this could be because for a single GPU it may not be just gathering the 96 kernels (48 each of the 2 defined in the paper), but where can I find those specifications then?

def alexnet(pretrained=False, **kwargs):
r"""AlexNet model architecture from the
`"One weird trick..." <https://arxiv.org/abs/1404.5997>`_ paper.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""

The original AlexNet model needed to be trained across 2 GTX 580 3GB GPUs because the model was too large to be trained on a single GPU (at the time). Krizhevsky (the author of the paper) released a single GPU implementation with different dimensions and the PyTorch implementation comes from there.