Behavior explain - parameters deepcopied from model 1 do not update in model 2 where they are part of the computation

sofuncheung · February 14, 2024, 1:26pm

Hi,

As the title suggests, if I deepcopy a model and use the parameters of the copied model to participate the computation of the original model, why are these copied parameters not being updated despite having requires_grad = True?

This is perhaps a desired behavior but I’d like to know how this is done.

Relevant code:

class IDModule(nn.Module):
“”"Intrinsic dimensionality wrapper module…
Takes in the network, a projector (a function(D,d)-> projection LinearOperator),
and the target intrinsic dimensionality.

Example usage:
id_net = IDModule(net, lambda D,d: LazyRandom(D,d), 1000)
"""

def __init__(self, net, projector, dimension=1000):
    super().__init__()

    self.d = dimension
    self._forward_net = [net]
    initnet = deepcopy(net)
    for orig_name, orig_p in initnet.named_parameters():
        if orig_p.requires_grad:
            _delchainattr(net, orig_name)
    aux = [(n, p) for n, p in initnet.named_parameters() if p.requires_grad]
    self.names, self.trainable_initparams = zip(*aux)
    self.trainable_initparams = [param for param in self.trainable_initparams]
    self.names = list(self.names)
    self.D = sum([param.numel() for param in self.trainable_initparams])
    self.subspace_params = nn.Parameter(torch.zeros(self.d))
    self.P = projector(self.D, self.d, self.trainable_initparams, self.names)

def to(self, *args, **kwargs):
    self._forward_net[0].to(*args, **kwargs)
    self.trainable_initparams = [ 
        param.to(*args, **kwargs) for param in self.trainable_initparams
    ]
    return super().to(*args, **kwargs)

def forward(self, *args, **kwargs):
    flat_projected_params = self.P @ self.subspace_params
    unflattened_params = unflatten_like(
        flat_projected_params, self.trainable_initparams
    )
    iterables = zip(self.names, self.trainable_initparams, unflattened_params)
    for p_name, init, proj_param in iterables:
        p = init + proj_param.view(*init.shape)  # will init still require gradient?
        _setchainattr(self._forward_net[0], p_name, p)
        #Mark: poking here
        #print("-"*60)
        #print( init)
        # Mark: my experiments say init does require grad but its value doesn't change.

    return self._forward_net[0](*args, **kwargs)

In the above code, “init” are the trainable parameters from the copied model and that participate the computation of the original model, but I have checked they don’t update when the optimization was done w.r.t. the original model’s parameters (self.subspace_params to be precise). Considering the parameters of the original model are now basically (init + M·subspace_params) but only the latter ones get updated. Why this happens? Any help would be appreciated.