However, if the in_channels = 3, out_channels = 9, groups = 1, then there will be 27 weight matrix of *kernel_size (27 filters) created.
But, I think, there should be only 3 weight matrix of *kernel_size (3 filters) and they are shared among the 3 input channels, isn’t it?

Could anyone told me which part of my thinking is wrong?

Do you mean that, in this implementation, there is no parameter sharing?
Each output channel is the sum of 3 distinct filters applied on each input channel?
That is, in my previous case, if there are 27 filters [f1, …, f27].
Then,
f1 - f3 contribute to output_channel1.
f4 - f6 contribute to output_channel2.
…
f25 - f27 contribute to output_channel9.

Is it correct? Is the interaction between the 3 channels really a sum?

Another question, if the groups is set to 3, then each output channel will have only 1 filters contribute to it. So, 9 filters in total. Right?

How about grouping?
Does it means to arrange the input channels into several groups. And each group will have a seperated fitler of dimension (input_channel // num_groups, kernel_size, kernel_size)? (like in AlexNet there are 2 groups?)

Sorry, haven’t seen this post.
The grouping parameter lets you decide how the filters are connected between the input channels and output channels.
E.g. in a vanilla convolution, each kernel will convolve the input using all input channels.
I.e. for an input of dimension [batch, 10, 24, 24], each kernel (with kernel_size=3 will have a dimension of [10, 3, 3]. The weights in this conv layer will therefore have a dimension of [number_of_kernels, 10, 3, 3,].

Using groups=2 for 20 kernels will yield a weight dimension of [20, 5, 3, 3].
The documentation explains:

At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.

That’s why each kernel will only see 5 input channels in my example.
Note that in_channels and out_channels both have to be divisible by groups.

For groups=in_channels each input channel will have its own set of filters.

Hope that still helps and sorry for the late reply