Sharing weights between the units of the same layer and between layers

Hello,

I have an MLP with a particular structure. I would like to share weights between the units of the same layer and between layers as depicted in the following figure :

How can l design such an MLP and do l need a custom backward procedure or l can use the standard one in pytorch ?
@albanD
Thank you for your help.

1 Like
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = nn.Parameter(torch.rand(3))  # w1, w2, w3
    def forward(self, x):
        x = torch.sum(x * self.w)
        x = torch.sum(x * self.w)
        return x

Hi @Eta_C,

Where is the linear layer ?

x = torch.sum(x * self.w) for the second layer it makes sense. However, in the first layer each unit has 3 shared connections.

The way your weights are shared, w1 is multiplied with the first entry, w2 with the second and w3 with the third. THen you sum the three, and this is all the nodes after the first layer. That means that all 3 nodes in the middle here have the same value. Is that expected?

Hi @albanD. Yes , it is

Ok, then that should work for x.size() == [batch, 3] (you can remove the extra dimension in the parameters if you don’t want the batch dimension):

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = nn.Parameter(torch.rand(1, 3))  # w1, w2, w3
    def forward(self, x):
        one_node = torch.sum(x * self.w, dim=-1, keepdim=True)
        return one_node * self.w.sum(dim=-1, keepdim=True)

Thank you @albanD for the answer. However, each node of the MLP represents à n by n matrix not a scalar. Hope it’s clear :

Inputs are

x1[n,n] x2[n,n] x3[n,n]

And w1, w2 and w3 are scalars?

Exactly : x1[n,n] x2[n,n] x3[n,n] are matrices and w1, w2, w3 are scalars

And do you get a batch of x? What would be the size of the input to your module? [batch, 3, n, n] or [3, n, n] ?

x.size()= [batch,n, n]

I’m confused, there is not 3 ? Where does x1,x2,x3 come from?

In the MLP figure, the three nodes refers to the three matrices x1,x2,x3 and the output of the MLP is x_{mlp}=[n,n]

Could you give a small code sample that shows the input sizes and what you expect as output?
Thanks !

Here is :

b= 3
n=1000
input=torch.rand(b,n,n)

first_layer_node_1=Relu(w1*w2*w3*input[0])# n by n
first_layer_node_2=Relu(w1*w2*w3*input[1])# n by n
first_layer_node_3=Relu(w1*w2*w3*input[2])# n by n


output_layer=Relu(w1*first_layer_node_1+w2*first_layer_node_2+w3*first_layer_node_3)# n by n

Thank you

Hi,

You can actually use this code with w1, w2 and w3 being your weights, and it will perform weight sharing because you reuse them.

1 Like

Thank you a lot @albanD