Well, based on the link you’ve posted, you could use the functional API and change the weight tensor manually.
However, if you increase the size of your weight, what new values should be there?
Should it be initialized randomly, while the rest of the weight tensor was already learned?
If so, here is a small example:
self.weight = nn.Parameter(torch.randn(10, 2))
def forward(self, x):
x = F.linear(x, self.weight)
model = MyModel()
x = torch.randn(1, 2)
output = model(x)
# Add another input feature
model.weight = nn.Parameter(torch.cat((model.weight, torch.randn(10, 1)), 1))
x = torch.randn(1, 3)
output = model(x)
That might be causing some trouble, as your optimizer won’t be able to update the parameter.
I’ve missed it in the other post, but after updating your weight parameter, you should also re-create the optimizer with the new parameter.
You could probably try to create a new optimizer after each forward call, but that would restrict you to using optimizers without running estimates.
There could be a way, but it would be very hacky, since you would have to work a lot in the internals of the optimizer implementation.
I’ve created a small Gist to show, how it would be done, but again I would like to point out that this code should better be not used in any form.
Besides the hacking into internals (which might break at any time), I’m not sure, if using old internal states for the parameter would give any advantages methodically.
I tried using the same approach that you mentioned for dynamically changing weight matrix.
But I am getting error as
RuntimeError: size mismatch, m1: [771 x 20], m2: [3 x 20] at /opt/conda/conda-bld/pytorch-cpu_1532576596369/work/aten/src/TH/generic/THTensorMath.cpp:2070
In the code I have to actually add atleast 12 hidden neurons dynamically but initially I tried using only one neuron. (trying cascade architecture with RPROP)
def init(self, n_input, n_hidden, n_output):
self.sigmoid = torch.nn.Sigmoid()
self.relu = torch.nn.ReLU()
self.weight1 = nn.Parameter(torch.randn(n_input, 3)) # for all hidden layers
self.weight2 = nn.Parameter(torch.randn(2, 2)) # for output layer
input as x is of shape 771 x 20 and weight1 matrix is of size 20 x 3 which should actually get easily multiplied but I observed that in linear function instead of directly using the passed weight1 matrix it is using weight.t1() (transpose of the matrix) not sure why?