In the vanilla convolution each kernel convolves over the whole input volume.

Example: Your input volume has 3 channels (RGB image).

Now you would like to create a ConvLayer for this image. Each kernel in your ConvLayer will use all input channels of the input volume. Let’s assume you would like to use a 3 by 3 kernel. This kernel will have 27 weights and 1 bias, since (W * H * input_Channels = 3 * 3 * 3 = 27 weights).

The number of output channels is the number of different kernels used in your ConvLayer. If you would like to output 64 channels, your layer will have 64 different 3x3 kernels, each with 27 weights and 1 bias.

I hope this makes it a bit clearer.

Have a look at Stanford’s CS231n if your would like to dig a bit deeper.