After randomly initializing the weight matrices of a GRU (let’s call them W
) I need to transform them by means of some function. For the sake of this discussion, let’s simplify and say I want to multiply W
by a scalar:
W <- alpha * W
where alpha
is a scalar parameter that I want my optimization algorithm to optimize.
Is there a way for me to accomplish this without rewriting the whole GRU code from scratch?
What I have tried so far:
def __init__(self):
self.gru = ...
alpha = torch.Tensor([0.5])
self.alpha = nn.Parameter(alpha, requires_grad=True)
def forward(self, x):
self.gru.weight_hh_l0.data = self.alpha * self.gru.weight_hh_l0
... normal forward code ...
If I optimize all parameters, this doesn’t work. If I optimize all parameters except gru.weight_hh_l0
, it still doesn’t work.