Why the dimension of weight in class _ConvNd is like this?

Luoshang_Lowson_Pan · December 22, 2017, 3:34am

I’m reading the code of Pytorch recently and could not understand the dimension of the weights of a CNN layer.

Link to the code I’m talking about from github.

The weight is defined as:

self.weight = Parameter(torch.Tensor(
                out_channels, in_channels // groups, *kernel_size))

However, if the in_channels = 3, out_channels = 9, groups = 1, then there will be 27 weight matrix of *kernel_size (27 filters) created.
But, I think, there should be only 3 weight matrix of *kernel_size (3 filters) and they are shared among the 3 input channels, isn’t it?

Could anyone told me which part of my thinking is wrong?

smth · December 22, 2017, 5:56am

each output channel is connected to all input channels. hence out_channels * in_channels filters exist.

Luoshang_Lowson_Pan · December 22, 2017, 8:52pm

Do you mean that, in this implementation, there is no parameter sharing?
Each output channel is the sum of 3 distinct filters applied on each input channel?
That is, in my previous case, if there are 27 filters [f1, …, f27].
Then,
f1 - f3 contribute to output_channel1.
f4 - f6 contribute to output_channel2.
…
f25 - f27 contribute to output_channel9.

Is it correct? Is the interaction between the 3 channels really a sum?

Another question, if the groups is set to 3, then each output channel will have only 1 filters contribute to it. So, 9 filters in total. Right?

Luoshang_Lowson_Pan · December 22, 2017, 9:12pm

Or should I think about it in this way?
Each filter is actually a cube of dimension (in_channel // groups, *kernel_size).

ptrblck · December 22, 2017, 10:31pm

I explained a similar question in this thread.

Luoshang_Lowson_Pan · December 22, 2017, 11:21pm

COOL… That explained my problem with output_channel~ @prtblck

Luoshang_Lowson_Pan · December 22, 2017, 11:33pm

How about grouping?
Does it means to arrange the input channels into several groups. And each group will have a seperated fitler of dimension (input_channel // num_groups, kernel_size, kernel_size)? (like in AlexNet there are 2 groups?)

ptrblck · February 13, 2018, 10:02am

Sorry, haven’t seen this post.
The grouping parameter lets you decide how the filters are connected between the input channels and output channels.
E.g. in a vanilla convolution, each kernel will convolve the input using all input channels.
I.e. for an input of dimension [batch, 10, 24, 24], each kernel (with kernel_size=3 will have a dimension of [10, 3, 3]. The weights in this conv layer will therefore have a dimension of [number_of_kernels, 10, 3, 3,].

Using groups=2 for 20 kernels will yield a weight dimension of [20, 5, 3, 3].
The documentation explains:

At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.

That’s why each kernel will only see 5 input channels in my example.
Note that in_channels and out_channels both have to be divisible by groups.

For groups=in_channels each input channel will have its own set of filters.

Hope that still helps and sorry for the late reply

Luoshang_Lowson_Pan · February 13, 2018, 5:48pm

Thanks. Got the idea here.
The ideas in CNN is really not easier to be described by language than by visualization!

ptrblck · February 14, 2018, 9:39am

Yeah, you are right.
Sometimes it helps to create dummy layers and study the shape

@ezyang created an awesome visualization for convolutions. Check if out here.

Luoshang_Lowson_Pan · February 14, 2018, 6:17pm

Cool!
I also find one here