How to dynamically change the size of nn.Linear

I’m trying to find a way to change the nn.Linear size dynamically. For example lets say I have the following layers:

self.fc1 = nn.Linear(z_dim, h_dim)
self.fcmean = nn.Linear(h_dim, z_dim)

Now lets say for simplicity I want to change z_dim dynamically by increasing it’s size based on a coin flip. In every epoch z_dim will increase in size by 1 or remain the same with probability .5.

I found this example:

But it is unclear how to exactly to use the Functional and nn.Parameter to incorporate a dynamic size of nn.linear

1 Like

Well, based on the link you’ve posted, you could use the functional API and change the weight tensor manually.
However, if you increase the size of your weight, what new values should be there?
Should it be initialized randomly, while the rest of the weight tensor was already learned?
If so, here is a small example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.weight = nn.Parameter(torch.randn(10, 2))
        
    def forward(self, x):
        x = F.linear(x, self.weight)
        return x

model = MyModel()
x = torch.randn(1, 2)
output = model(x)
output.mean().backward()
print(model.weight.grad)
model.zero_grad()

# Add another input feature
with torch.no_grad():
    model.weight = nn.Parameter(torch.cat((model.weight, torch.randn(10, 1)), 1))

x = torch.randn(1, 3)
output = model(x)
output.mean().backward()
print(model.weight.grad)
model.zero_grad()
1 Like

Thanks for the simple example @ ptrblck can I also do this in the forward function itself such that it yields the same results?

That is:

if rand > 0.5:
      self.weight = nn.Parameter(torch.cat((self.weight, torch.randn(10, 1)), 1))

That might be causing some trouble, as your optimizer won’t be able to update the parameter.
I’ve missed it in the other post, but after updating your weight parameter, you should also re-create the optimizer with the new parameter.
You could probably try to create a new optimizer after each forward call, but that would restrict you to using optimizers without running estimates.

It seems there is a way to dynamically change the learning rate as in:

https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Can I follow a similar step to change the parameters without losing the running averages? I know param_groups keeps the learning rate, is there a similar access to all the weights?

There could be a way, but it would be very hacky, since you would have to work a lot in the internals of the optimizer implementation.
I’ve created a small Gist to show, how it would be done, but again I would like to point out that this code should better be not used in any form.
Besides the hacking into internals (which might break at any time), I’m not sure, if using old internal states for the parameter would give any advantages methodically.

1 Like

I tried using the same approach that you mentioned for dynamically changing weight matrix.
But I am getting error as
RuntimeError: size mismatch, m1: [771 x 20], m2: [3 x 20] at /opt/conda/conda-bld/pytorch-cpu_1532576596369/work/aten/src/TH/generic/THTensorMath.cpp:2070

In the code I have to actually add atleast 12 hidden neurons dynamically but initially I tried using only one neuron. (trying cascade architecture with RPROP)
Code:-
class MultiLayerNet(torch.nn.Module):
def init(self, n_input, n_hidden, n_output):
super(MultiLayerNet, self).init()
self.sigmoid = torch.nn.Sigmoid()
self.relu = torch.nn.ReLU()
self.weight1 = nn.Parameter(torch.randn(n_input, 3)) # for all hidden layers
self.weight2 = nn.Parameter(torch.randn(2, 2)) # for output layer

def forward(self, x,count):
    print(count)
    print(x.shape, self.weight1)   
    if count == 1:
        h_pred = self.relu(F.linear(x,self.weight1))
        
    y_pred = self.sigmoid(F.linear(h_pred,self.weight2))
        
    return y_pred

Note:-
input as x is of shape 771 x 20 and weight1 matrix is of size 20 x 3 which should actually get easily multiplied but I observed that in linear function instead of directly using the passed weight1 matrix it is using weight.t1() (transpose of the matrix) not sure why?

Further logs:-

in forward(self, x, count)
19 print(x.shape, self.weight1)
20 if count == 1:
—> 21 h_pred = self.relu(F.linear(x,self.weight1))
22
23 y_pred = self.sigmoid(F.linear(h_pred,self.weight2))

/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1024 return torch.addmm(bias, input, weight.t())
1025
-> 1026 output = input.matmul(weight.t())
1027 if bias is not None:
1028 output += bias

Hi,

Just to note the weight matrix is transposed because it is the formula for the linear layer:
y = xA^T + b

So you must pass weight matrix that is transposed.

I can see that you are changing the input size of linear here. How would one go about dynamically changing the output size instead? Would it just be changing the last 1 for 0?

model.weight = nn.Parameter(torch.cat((model.weight, torch.randn(10, 1)), 1))

to

model.weight = nn.Parameter(torch.cat((model.weight, torch.randn(10, 1)), 0))

No, you would also need to change the newly added tensor:

with torch.no_grad():
    model.weight = nn.Parameter(torch.cat((model.weight, torch.randn(1, 2)), 0))
1 Like

I see what you mean. Thank you