Sharing weights between the units of the same layer and between layers

DeepLearner17 · January 11, 2020, 5:21pm

Hello,

I have an MLP with a particular structure. I would like to share weights between the units of the same layer and between layers as depicted in the following figure :

How can l design such an MLP and do l need a custom backward procedure or l can use the standard one in pytorch ?
@albanD
Thank you for your help.

Eta_C · January 13, 2020, 3:00am

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = nn.Parameter(torch.rand(3))  # w1, w2, w3
    def forward(self, x):
        x = torch.sum(x * self.w)
        x = torch.sum(x * self.w)
        return x

DeepLearner17 · January 13, 2020, 1:50pm

Hi @Eta_C,

Where is the linear layer ?

x = torch.sum(x * self.w) for the second layer it makes sense. However, in the first layer each unit has 3 shared connections.

albanD · January 13, 2020, 3:08pm

The way your weights are shared, w1 is multiplied with the first entry, w2 with the second and w3 with the third. THen you sum the three, and this is all the nodes after the first layer. That means that all 3 nodes in the middle here have the same value. Is that expected?

DeepLearner17 · January 13, 2020, 3:22pm

Hi @albanD. Yes , it is

albanD · January 13, 2020, 3:26pm

Ok, then that should work for x.size() == [batch, 3] (you can remove the extra dimension in the parameters if you don’t want the batch dimension):

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = nn.Parameter(torch.rand(1, 3))  # w1, w2, w3
    def forward(self, x):
        one_node = torch.sum(x * self.w, dim=-1, keepdim=True)
        return one_node * self.w.sum(dim=-1, keepdim=True)

DeepLearner17 · January 13, 2020, 3:58pm

Thank you @albanD for the answer. However, each node of the MLP represents à n by n matrix not a scalar. Hope it’s clear :

Inputs are

x1[n,n] x2[n,n] x3[n,n]

albanD · January 13, 2020, 5:05pm

And w1, w2 and w3 are scalars?

DeepLearner17 · January 13, 2020, 5:36pm

Exactly : x1[n,n] x2[n,n] x3[n,n] are matrices and w1, w2, w3 are scalars

albanD · January 13, 2020, 5:40pm

And do you get a batch of x? What would be the size of the input to your module? [batch, 3, n, n] or [3, n, n] ?

DeepLearner17 · January 13, 2020, 5:54pm

x.size()= [batch,n, n]

albanD · January 13, 2020, 5:57pm

I’m confused, there is not 3 ? Where does x1,x2,x3 come from?

DeepLearner17 · January 13, 2020, 5:59pm

In the MLP figure, the three nodes refers to the three matrices x1,x2,x3 and the output of the MLP is x_{mlp}=[n,n]

albanD · January 13, 2020, 6:03pm

Could you give a small code sample that shows the input sizes and what you expect as output?
Thanks !

DeepLearner17 · January 13, 2020, 6:15pm

Here is :

b= 3
n=1000
input=torch.rand(b,n,n)

first_layer_node_1=Relu(w1*w2*w3*input[0])# n by n
first_layer_node_2=Relu(w1*w2*w3*input[1])# n by n
first_layer_node_3=Relu(w1*w2*w3*input[2])# n by n


output_layer=Relu(w1*first_layer_node_1+w2*first_layer_node_2+w3*first_layer_node_3)# n by n

Thank you

albanD · January 13, 2020, 7:28pm

Hi,

You can actually use this code with w1, w2 and w3 being your weights, and it will perform weight sharing because you reuse them.

DeepLearner17 · January 14, 2020, 12:35pm

Thank you a lot @albanD