Dynamically change network architecture during training

shreeshan · May 14, 2020, 9:38pm

Hi,
I have a use case where in after every epoch based on some constraints I want to change the in_features and out_features of each Linear layer in my network

Here is an example

Initial architecture:

Model(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=100, bias=True)
    (1): Linear(in_features=884, out_features=50, bias=True)
    (2): Linear(in_features=934, out_features=10, bias=True)
  )
  (relu_activation): ReLU()
  (softmax_activation): LogSoftmax()
)

The above network runs for one full epoch, after which, I change the network to look like this:

Model(
  (layers): ModuleList(
    (0): Linear(in_features=753, out_features=88, bias=True)
    (1): Linear(in_features=841, out_features=4, bias=True)
    (2): Linear(in_features=845, out_features=10, bias=True)
  )
  (relu_activation): ReLU()
  (softmax_activation): LogSoftmax()
)

As you can see in_features and out_features changed after running for an epoch.

Below is how I change the network’s structure


with torch.no_grad():
    for layer_idx in range(len(layers)):
        w = layers[layer_idx].weight.data.clone()
        b = layers[layer_idx].bias.data.clone()
        new_w, new_b = reduce(w, b) # Custom logic to calculate new sets of w and b (always new_b.shape < w.shape)
        layers[layer_idx].weight.set_(nn.Parameter(new_w, requires_grad=True))
        layers[layer_idx].bias.set_(nn.Parameter(new_b, requires_grad=True))

        # Changing the grad values to match the shape of new weights and bias. 
        # Assigning random values to grad. This is just to match shapes 
		layers[layer_idx].weight.grad = nn.Parameter(torch.ones(new_w.shape))
		layers[layer_idx].bias.grad = nn.Parameter(torch.ones(new_b.shape))

The above code successfully changes the network architecture in the first epoch. But the graph does not execute for the next epoch. I get the below error:

  File "/__init__.py", line 127, in run_train
    loss.backward()
  File "/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

If i remove the torch.no_grad block then it says:

RuntimeError: derivative for set_ is not implemented

Can someone point me to the list of things I might messing around with. My end goal is to change the network architecture at every epoch while training.

Any help will be appreciated. Thanks

albanD · May 14, 2020, 9:59pm

Hi,

I don’t think you want to change the weights inplace like that.
Why not do

for layer_idx in range(len(layers)):
  new_w, new_b = reduce(w, b)
  layers[layer_idx].weight = nn.Parameter(new_w)

shreeshan · May 15, 2020, 10:06pm

Hi,
Thanks for the response.

Just to make sure I was replacing weights and biases the right way, I commented the reduce(w,b) call and made the below changes to the code

for layer_idx in range(len(layers)):
    w = layers[layer_idx].weight.data.clone()
    b = layers[layer_idx].bias.data.clone()
    # new_w, new_b = reduce(w, b) # Custom logic to calculate new sets of w and b (always new_b.shape, w.shape)

    layers[layer_idx].weight = nn.Parameter(w)
    layers[layer_idx].weight.data = nn.Parameter(b)
    layers[layer_idx].bias = nn.Parameter(w)
    layers[layer_idx].bias.data = nn.Parameter(b)
    

    # Changing the grad values to match the shape of new weights and bias. 
    # Assigning random values to grad. This is just to match shapes 
	layers[layer_idx].weight.grad = nn.Parameter(torch.ones(w.shape))
    layers[layer_idx].weight.data.grad = nn.Parameter(torch.ones(w.shape))    
    layers[layer_idx].bias.grad = nn.Parameter(torch.ones(b.shape))
	layers[layer_idx].bias.data.grad = nn.Parameter(torch.ones(b.shape))

Basically I am not making any changes to weights and bias. I re-assign them to see if the network trains normally.

However, to my surprise, gradients after first epoch never change. All the gradient values equal to 1. By debugging optimizer’s param_groups I see that for every layer the gradients for weight and bias are 1. Although I initialize it to 1 after every epoch(which I do to match the shape), the loss.backward() and optimizer.step() should calculate new gradients right ?

Is there anything I might be missing? Please point me in the right direction if I am doing something stupid.

Thank You

albanD · May 15, 2020, 10:30pm

Hi,

You should never user “.data” as a general rule. There is no reason for your to use it.
Also setting the .grad field is not very useful I think. You can just set it to None if you really want to reset it.

So the code I shared in my previous message will be enough. As the new parameter will not have a .grad field anyways. Nothing else should be needed.

shreeshan · May 15, 2020, 11:45pm

Consider this situation:

I have model X which does not have any fancy weight replacing stuff. I train it for 10 epochs and observe some result p
I have another model X-dynamic where I replace weights using the code I posted in the previous reply. (Please note I just clone the weights and re-assign, values unaltered). I train it for 10 epochs and observe some result q

If I use torch.seed both these models should give me exactly same results because none of the values are altered or changed.

Case1: Not updating .data

p is not equal to q

Case2: Operating on .data

p is equal to q

This is strange as to why the results change if I do not update .data params.