Hi,
As the title suggests, if I deepcopy a model and use the parameters of the copied model to participate the computation of the original model, why are these copied parameters not being updated despite having requires_grad = True?
This is perhaps a desired behavior but I’d like to know how this is done.
Relevant code:
class IDModule(nn.Module):
“”"Intrinsic dimensionality wrapper module…
Takes in the network, a projector (a function(D,d)-> projection LinearOperator),
and the target intrinsic dimensionality.
Example usage:
id_net = IDModule(net, lambda D,d: LazyRandom(D,d), 1000)
"""
def __init__(self, net, projector, dimension=1000):
super().__init__()
self.d = dimension
self._forward_net = [net]
initnet = deepcopy(net)
for orig_name, orig_p in initnet.named_parameters():
if orig_p.requires_grad:
_delchainattr(net, orig_name)
aux = [(n, p) for n, p in initnet.named_parameters() if p.requires_grad]
self.names, self.trainable_initparams = zip(*aux)
self.trainable_initparams = [param for param in self.trainable_initparams]
self.names = list(self.names)
self.D = sum([param.numel() for param in self.trainable_initparams])
self.subspace_params = nn.Parameter(torch.zeros(self.d))
self.P = projector(self.D, self.d, self.trainable_initparams, self.names)
def to(self, *args, **kwargs):
self._forward_net[0].to(*args, **kwargs)
self.trainable_initparams = [
param.to(*args, **kwargs) for param in self.trainable_initparams
]
return super().to(*args, **kwargs)
def forward(self, *args, **kwargs):
flat_projected_params = self.P @ self.subspace_params
unflattened_params = unflatten_like(
flat_projected_params, self.trainable_initparams
)
iterables = zip(self.names, self.trainable_initparams, unflattened_params)
for p_name, init, proj_param in iterables:
p = init + proj_param.view(*init.shape) # will init still require gradient?
_setchainattr(self._forward_net[0], p_name, p)
#Mark: poking here
#print("-"*60)
#print( init)
# Mark: my experiments say init does require grad but its value doesn't change.
return self._forward_net[0](*args, **kwargs)
In the above code, “init” are the trainable parameters from the copied model and that participate the computation of the original model, but I have checked they don’t update when the optimization was done w.r.t. the original model’s parameters (self.subspace_params to be precise). Considering the parameters of the original model are now basically (init + M·subspace_params) but only the latter ones get updated. Why this happens? Any help would be appreciated.