I don’t know how to use GRU
It moves but doesn’t learn
This is the first time I’ve used an RNN layer in pytorch, so I think this is the cause.
Please tell me if there is not enough information I will add it
Causes i can think of
1
Insufficient understanding of gru and proper initialization (for example, he、Set bias to 0)
I don’t even know how to access weight bias
2
There is a tanh in the GRU (in the official description)
3
Something is missing
I’m doing something a little different after GRU. Is this a problem?
There should be no problem in itself as it is from the code before the introduction of GRU
L = self.L(x).view(bacth_size*timee, self.num_outputs,self.num_outputs)
tril_mask = torch.tril(torch.ones(self.num_outputs,self.num_outputs), diagonal=-1).unsqueeze(0).to("cuda:0")
diag_mask = torch.diag(torch.diag(torch.ones(self.num_outputs,self.num_outputs))).unsqueeze(0).to("cuda:0")
L = L * tril_mask.expand_as(L) + torch.exp(L) * diag_mask.expand_as(L)
P = torch.bmm(L, L.transpose(2, 1))
\begin{array}{ll}
r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\
z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\
n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\
h' = (1 - z) * n + z * h
\end{array}
I don’t want to use tanh
Is there a way to change to Relu or swish?