I am really new to PyTorch and was wondering if there is a way to specify only a subset of neurons (of a particular layer) to update during training and freeze the rest. Say, update only 2500 nodes of the 4096 in AlexNet, FC7. param.requires_grad seems to apply to all the neurons.

I add simple codes. If I understand your question exactly, It will be helpful

D_parameters = [
{'params': model.fc1.parameters()},
{'params': model.fc2.parameters()}
] # define a part of parameters in model
optimizerD = torch.optim.Adam(D_parameters, lr=learning_rate)
for epoch in range(training_epoch):
....
optimizerD.zero_grad() # zero_grad only D_parameters
# do something
loss.backward() # calculate grads of all
optimizerD.step() # update only D_parameters

Appreciate your prompt response but this is not exactly what I am looking for. I want to update only a subset of the fc1 parameters, for example. If we update D_parameters, in your example, then all weights from the 4096 nodes of FC1 get updated. What I want is to update a subset of it, say 2900 of them (that I have as a list).

you could use a backward hook on the output on fc2 to zero the gradients going through parts that you want to filter.

For example:

m = nn.Linear(1024, 4096)
input = Variable(torch.randn(128, 1024), requires_grad=True)
out = m(input)
def my_hook(grad):
grad_clone = grad.clone()
grad_clone[:, 2500:] = 0
return grad_clone
h = out.register_hook(my_hook) # zeroes the gradients wrt the outputs for everything that's not 0 to 2500 over all mini-batches
out.backward(grads)

Edit: edited to incorporate @fmassaâ€™s answer below

I think itâ€™s better to avoid modifying the gradients in place, but instead return a new gradient, as explained in the docs.
So a modified version would be to pass a hook such as:

It depends on your loss.
If your loss is a scalar, it is ok to write loss.backward(). If it is a tensor with length > 1, you need to write loss.backward(grads), and the grads has the same size with your out. For example, loss.backward(torch.ones_like(loss.data)).

What if I need the data of Tensor in the hooked_fn. As far as I understand, it will only have access to grads of the Tensor on which it is registered. I would like to modify gradients based on the current data of Tensor. The end goal is to implement Inverting Gradients given in the paper â€śDeep Reinforcement Learning in Parameterized Action Spaceâ€ť.

EDIT:
We have access to variables defined outside the scope of hooked_fn. Hence, we can simply do data = hooked_tensor.clone().numpy() inside the hooked_fn. Hence new_grad = some_func(data, grad).