Backwards Pass Gradient Calculation for Part of a Layer

flarelink · June 26, 2019, 10:35pm

Hello all,

I’m currently met with a problem that I’m unsure of how to perform. In the backwards pass for PyTorch, the gradients are calculated to update the weights of each layer.

However, let’s say I take a layer and set part of the weights to a fixed-random value, by setting requires_grad = False, while leaving the rest of a layer to be trained, by leaving requires_grad = True, I can see that the values of the random weights aren’t changed throughout training and the trained weights are. Let’s call this network A. Similarly, if I had the same network with the same random seed but all weights are set to requires_grad = True, I would have all the weights tuned. Let’s call this network B.

I’ve confirmed that in both networks A and B the trained weights would be equivalent. This would mean that the network is looking at all of the weights to update its gradients and then the values of the trained weights.
What I would like to do is have the trained weights only calculate the gradients relative to the weights that have requires_grad=True and have it ignore all the requires_grad=False weight so they don’t influence the gradient calculation. Is there a method to do this in PyTorch?

Please let me know if I can help to clarify what I would like to have done. It’s a little confusing, but in short I want to leave a part of a network set to random weights to still extract features while only having the trained weights perform calculations with respect to only the trained weights, not all weights which would include random and trained.