From the LeNet paper of November 1998 I see that the third convolution layer is implemented with 6 input layers and 16 output layers. The 16 output layers are made from a combination of the 6 input layers according to a table, also in the paper:
What I do not see is how pytorch implements this feature map table.
Short answer: In my understanding, with Conv2d, there is no way of specifying local connections unless the connections are enforced after Conv2d with a hand-designed mask tensor. Conv2d connects all input feature maps to all output feature maps (with the exception of group convolution).
In my understanding of LeNet paper, the local connections in 'LeNet` were hand-designed for two reasons:
to keep the computation & memory complexity within bounds.
to break the symmetry between the learned features
With the developments of GPU technology over the years, reason 1 is not needed as of today (atleast for small networks like LeNet). For reason 2, we may use other methods such as randomly initializing weights, dropout etc., to break symmetry.
The Conv2d is designed to be simplistic and perform the conv operation with all input feature maps connected to all output feature maps. It is up to the user to enforce any hand-designed connection rules, if needed. after getting the conv output.