What is the use of conv2 layer in torchvision's googlenet implementation?

In torchvision’s googlenet implementation, where is 1x1 convolution layer in the stem region.

However, from this paper(https://static.googleusercontent.com/media/research.google.com/ko//pubs/archive/43022.pdf),

I cannot find any clue where this layer came from.

Is this an error?