Optimizing initialization of a GRU

After randomly initializing the weight matrices of a GRU (let’s call them W) I need to transform them by means of some function. For the sake of this discussion, let’s simplify and say I want to multiply W by a scalar:

W <- alpha * W

where alpha is a scalar parameter that I want my optimization algorithm to optimize.

Is there a way for me to accomplish this without rewriting the whole GRU code from scratch?

What I have tried so far:

def __init__(self):
    self.gru = ...
    alpha = torch.Tensor([0.5])
    self.alpha = nn.Parameter(alpha, requires_grad=True)
def forward(self, x):
    self.gru.weight_hh_l0.data = self.alpha * self.gru.weight_hh_l0
    ... normal forward code ...

If I optimize all parameters, this doesn’t work. If I optimize all parameters except gru.weight_hh_l0, it still doesn’t work.