# I don't know how to use GRU Moves but does not learn

I don’t know how to use GRU
It moves but doesn’t learn
This is the first time I’ve used an RNN layer in pytorch, so I think this is the cause.
Please tell me if there is not enough information I will add it

code
#Definition
self.rnn=nn.GRU(self.hidden_size1,gru_hidden)
self.norm=nn.LayerNorm(gru_hidden)
self.Relu=swish(0.7)

#runtime
x=self.Linear(x)
x,hidden=self.rnn(x,hidden)
onry_x=self.Relu(self.norm(x))

Causes i can think of
1
Insufficient understanding of gru and proper initialization (for example, he、Set bias to 0)
I don’t even know how to access weight bias

2
There is a tanh in the GRU (in the official description)
3
Something is missing

I’m having trouble understanding your issue. Could you post some executable code?

If you put all of them, it will be long, so put only a part

loss=torch.sum(output.permute(1,2,0),dim=2)-torch.sum(targets.detach().permute(1,2,0),dim=2)

loss=loss__(loss)#(loss**2)/2
loss.backward()
optimizer.step()
scheduler.step()
self.tortal_losses+=loss.clone().detach().to('cpu')

target.size=time,batch,output
output.size=time,batch,output

input.size=time,batch,20

model

  Definitionself.rnn = nn.GRU（self.hidden_​​size1、gru_hidden）
self.norm = nn.LayerNorm（gru_hidden）
self.Relu = swish（0.7）

#runtime
x = self.Linear（x）
x、hidden = self.rnn（x、hidden）
onry_x = self.Relu（self.norm（x）


I’m doing something a little different after GRU. Is this a problem?
There should be no problem in itself as it is from the code before the introduction of GRU

   L = self.L(x).view(bacth_size*timee, self.num_outputs,self.num_outputs)
P = torch.bmm(L, L.transpose(2, 1))

In the explanation of pytorch, it looks like this

   \begin{array}{ll}
r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\
z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\
n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\
h' = (1 - z) * n + z * h
\end{array}


I don’t want to use tanh
Is there a way to change to Relu or swish?