Update only parameters of choosen neurones in the Backpropagation phase of a Neural Network

ykemiche · October 24, 2022, 11:44am

I want to update only paramters of choosen neurones(and freeze other neurones parameters) when performing the backpropagation step

How could that be possible since Pytorch rely on automatic differentiation which differentiate over entire layers, not on single parameters/neurons ?

for example I want to freeze the parameters of the green neurones(image below) and update the rest only in the backpropagation phase:

Thanks in advance

srishti-git1110 · October 24, 2022, 12:00pm

You could try to manually set their requires_grad attribute to False like here.

ykemiche · October 24, 2022, 12:05pm

But I want to freeze some neurones of a layer, not the layer itself

srishti-git1110 · October 24, 2022, 3:29pm

Hi,
Please see if this helps:

import torch
import torch.nn as nn
class Model(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(2, 4)

  def forward(self, x):
    return self.fc1(x)

net = Model()
old_param = None
for layer in net.children():
  for param in layer.parameters():
    print(param)

out -

Parameter containing:
tensor([[ 0.1986,  0.5461],
        [-0.3179,  0.6386],
        [-0.5540,  0.6484],
        [ 0.4686,  0.1718]], requires_grad=True)
Parameter containing:
tensor([ 0.6834,  0.4345,  0.1403, -0.3439], requires_grad=True)

Update all the parameters once and recover specific nodes, like so:

for layer in net.children():
  for param in layer.parameters():
    old_param = param.detach().clone() # weight
    break # bias not needed


loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.5)


for epoch in range(1):
  input = torch.tensor([1.0, 20])
  out = net(input)
  target = torch.tensor([15.0, 20, 25, 30])
  loss = loss_fn(out, target)
  loss.backward()
  optimizer.step() # updates all parameters

  # restoring the ones that do not need an update using old_param
  for layer in net.children():
    for param in layer.parameters():
      param.requires_grad = False # required otherwise an in-place error would occur in the next step
      param[2] = old_param[2]
      break

for layer in net.children():
    for param in layer.parameters():
      print(param)

# setting the requires_grad back to True
for layer in net.children():
    for param in layer.parameters():
      param.requires_grad = True

gives:

Parameter containing:
tensor([[ 0.6986,  1.0461],
        [ 0.1821,  1.1386],
        [-0.5540,  0.6484],
        [ 0.9686,  0.6718]])
Parameter containing:
tensor([1.1834, 0.9345, 0.6403, 0.1561], requires_grad=True)

This could easily get messy for a larger model, but this is the only way that I was able to figure out as of now.
Best,
S

ykemiche · October 25, 2022, 1:24pm

@srishti-git1110 thanks for your reply, but my goal is to save computation by skipping some gradient calculations,and the method you suggested is not the right one for me

ykemiche · October 25, 2022, 2:22pm

Hi @ptrblck, do you think it’s possible to update parameters only of desired neurons(by assuming that some neurones parameters don’t need to be updated…to save computation)?

ptrblck · October 25, 2022, 4:36pm

I don’t think you will be able to save gradient computation as PyTorch will calculate the gradients for the entire parameter which then needs some additional processing.

@srishti-git1110’s approach of restoring parameters is valid and also makes sure that optimizers using running stats will work. If you are using simple optimizers which use the gradients only (i.e. no running stats), you might also be able to use gradient hooks and zero out the gradient of the parameter values which should be static.

ykemiche · October 27, 2022, 9:37am

@ptrblck Thanks for your answer,

Is it possible to make a copy of the whole model and run the forward propagation and backpropagation separately as follows:

model A: the original model
model B: the copy of the whole model without the neurons that are at equilibrium which do not need to be updated (obtained using my function).

Following the following steps:

1-run the forward propagation on the model A
2-pass the equivalent parameters to model B
3-run the backpropagation on the model B(using the loss obtained with the model A)
4-pass the equivalent parameters to the model A and repeat the loop

do you think this is possible and will reduce the training time?
do you have a better idea?

Thanks in advance

srishti-git1110 · October 27, 2022, 9:54am

Hi,
Some code would probably help to better understand the steps you laid out.

[For step-3…]
In any case, note that the tensor you call backward on determines which tensors the gradients get calculated for.
That to say, if you call loss.backward(), the grad attribute of only the leaf tensors in the graph of loss shall get populated.

(It’s possible for non-leaf tensors but that isn’t relevant here.)

I’m not sure what you mean by passing the parameters from one model to the other.