In the “Neural Networks” chapter of the PyTorch “60 Minute Blitz” tutorial, the final link in the example network (Yann LeCun’s LeNet) is described as a set of “Gaussian connections”. These are then implemented in torch in almost the same way as the two “Full connection” links, the only difference being that there is no ReLU activation function applied. Is this the only difference between a full connection and a Gaussian one? Or how does the lack of a ReLU imply that we end up with a layer of Gaussian connections?
It is a way to define loss for classification. Softmax has mostly replaced it nowadays. Ref: https://www.quora.com/What-is-the-Gaussian-Connection-in-LeNet-5s-final-step
And is a simple linear transform with no activation function applied really the way this classification loss is implemented (notwithstanding the fact that softmax is usually preferred nowadays) or is that just a simplification for the sake of the tutorial?
Concretely, the gaussian connection is defined as fc layer + MSE loss.
Reference: original paper, right column page 8 http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf