After randomly initializing the weight matrices of a GRU (let’s call them
W) I need to transform them by means of some function. For the sake of this discussion, let’s simplify and say I want to multiply
W by a scalar:
W <- alpha * W
alpha is a scalar parameter that I want my optimization algorithm to optimize.
Is there a way for me to accomplish this without rewriting the whole GRU code from scratch?
What I have tried so far:
def __init__(self): self.gru = ... alpha = torch.Tensor([0.5]) self.alpha = nn.Parameter(alpha, requires_grad=True) def forward(self, x): self.gru.weight_hh_l0.data = self.alpha * self.gru.weight_hh_l0 ... normal forward code ...
If I optimize all parameters, this doesn’t work. If I optimize all parameters except
gru.weight_hh_l0, it still doesn’t work.