Hi,
I have a use case where in after every epoch based on some constraints I want to change the in_features
and out_features
of each Linear
layer in my network
Here is an example
Initial architecture:
Model(
(layers): ModuleList(
(0): Linear(in_features=784, out_features=100, bias=True)
(1): Linear(in_features=884, out_features=50, bias=True)
(2): Linear(in_features=934, out_features=10, bias=True)
)
(relu_activation): ReLU()
(softmax_activation): LogSoftmax()
)
The above network runs for one full epoch, after which, I change the network to look like this:
Model(
(layers): ModuleList(
(0): Linear(in_features=753, out_features=88, bias=True)
(1): Linear(in_features=841, out_features=4, bias=True)
(2): Linear(in_features=845, out_features=10, bias=True)
)
(relu_activation): ReLU()
(softmax_activation): LogSoftmax()
)
As you can see in_features
and out_features
changed after running for an epoch.
Below is how I change the network’s structure
with torch.no_grad():
for layer_idx in range(len(layers)):
w = layers[layer_idx].weight.data.clone()
b = layers[layer_idx].bias.data.clone()
new_w, new_b = reduce(w, b) # Custom logic to calculate new sets of w and b (always new_b.shape < w.shape)
layers[layer_idx].weight.set_(nn.Parameter(new_w, requires_grad=True))
layers[layer_idx].bias.set_(nn.Parameter(new_b, requires_grad=True))
# Changing the grad values to match the shape of new weights and bias.
# Assigning random values to grad. This is just to match shapes
layers[layer_idx].weight.grad = nn.Parameter(torch.ones(new_w.shape))
layers[layer_idx].bias.grad = nn.Parameter(torch.ones(new_b.shape))
The above code successfully changes the network architecture in the first epoch. But the graph does not execute for the next epoch. I get the below error:
File "/__init__.py", line 127, in run_train
loss.backward()
File "/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
If i remove the torch.no_grad
block then it says:
RuntimeError: derivative for set_ is not implemented
Can someone point me to the list of things I might messing around with. My end goal is to change the network architecture at every epoch while training.
Any help will be appreciated. Thanks