Grouped linear layer

mathfinder · October 28, 2018, 5:24am

I want to implement a linear function: y = [w_1x_1+b_1; w_2x_2+b_2;…;w_kx_k+b_k] the size of input x is (b,k*c), where b is the batch size, k is the number of groups, c is the number of channels. As shown below:

And a Grouped convolution layer is also needed. The grouped convolution layer just like using the convolution setting groups=k. The difference is that the weight and bias is not expected to be shared between groups.
Although this operation can be implemented with multiple linear or convolutional layers, I don’t think it is convenient.

Actually, I hope that different data may be processed with different weights. I want to transfer the batch to the channel, like (b, c, h, w)->(1, bxc, h, w), so that a sample corresponds to a group of channels, and each group of channels is processed with different weights. And then reshaping it back.

aliutkus · January 5, 2022, 7:13am

did you find an elegant way to do that ?

aliutkus · January 6, 2022, 9:06am

answering to my own question, I guess the easiest is just to go for torch.matmul

anton · December 19, 2022, 5:23pm

For num_blocks independent transforms, each from dim_in to dim_out, here is a workaround:

block_linear = torch.nn.Conv2d(
            in_channels=num_blocks * dim_in,
            out_channels=num_blocks * dim_out,
            kernel_size=1,
            groups=num_blocks,
            bias=True,
        )

Then, for x of size [batch x num_blocks x dim_in], the mapping looks as follows:

x = x.reshape(batch, num_blocks * dim_in, 1, 1)
x = block_linear(x)  # output is of size [batch x (num_blocks * dim_out) x 1 x 1]
x = x.reshape(batch, num_blocks, dim_out)