For example in my network.
One of the fully connected layer has a weight [100,300] which maps the input of size [batch_size,100] to output[batch_size, 300].
How can I only allow the first 30 neurons change its gradient and let the rest of 70 freeze without setting the gradient to 0 which is not I want because it cannot back propagate the gradients to the former layer. The strategy is on the runtime, so you cannot set it beforehand and can only change it on the runtime.
What I try to do is: weight[30:100,:].requires_grad = False.
However, I got the following error:
*** RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn’t require differentiation use var_no_grad = var.detach().