Hello,
This is my first question and I wanted to say thanks for all the good answers I have found here.
I am having trouble backpropagating through an activation I am testing.
self.a = Parameter(torch.mul(torch.ones(1,N),.5))
self.g = Parameter(torch.rand(1,N))
self.a_list = []
self.g_list = []
def forward(self, x, z):
return (1-self.a)*x + self.a*self.g*torch.tanh(x+z)
I would like to learn parameters a and g however the product of self.a and self.g is an issue during backprop since self.a is changed in place before the new self.g is calculated.
Inputs x and z are outputs from separate linear layers.
I have attempted different cloning schemes for these parameters among other things.
As of now (different from the lines above), I am treating them as states similar to LSTM’s “hidden” and passing it through the full network repeatedly until calulating loss on the last prediction.
Is there better way to do this? Also, aside from using make_dot() is there a tool to see how the graph deletes during backprop. From what I have seen it looks like the list and deletions form from left to right and this is why I believe self.a is being changed first.
Thanks for your time!