in the file torch\nn\modules\rnn.py
I replaced the _VF with custom code i.e.
ret = _VF.gru_cell(
input, hx,
self.weight_ih, self.weight_hh,
self.bias_ih, self.bias_hh,
)
replaced by
import torch as tt
import torch.nn.functional as ff
r1, r2 = 0, 1
z1, z2 = 1, 2
n1, n2 = 2, 3
R = tt.sigmoid(
ff.linear(input, self.weight_ih[r1*self.hidden_size:r2*self.hidden_size,:], self.bias_ih[r1*self.hidden_size:r2*self.hidden_size]) \
+ \
ff.linear(hx, self.weight_hh[r1*self.hidden_size:r2*self.hidden_size,:], self.bias_hh[r1*self.hidden_size:r2*self.hidden_size])
)
Z = tt.sigmoid(
ff.linear(input, self.weight_ih[z1*self.hidden_size:z2*self.hidden_size,:], self.bias_ih[z1*self.hidden_size:z2*self.hidden_size]) \
+ \
ff.linear(hx, self.weight_hh[z1*self.hidden_size:z2*self.hidden_size,:], self.bias_hh[z1*self.hidden_size:z2*self.hidden_size])
)
N = tt.tanh(
ff.linear(input, self.weight_ih[n1*self.hidden_size:n2*self.hidden_size,:], self.bias_ih[n1*self.hidden_size:n2*self.hidden_size]) \
+ \
R * ff.linear(hx, self.weight_hh[n1*self.hidden_size:n2*self.hidden_size,:], self.bias_hh[n1*self.hidden_size:n2*self.hidden_size])
)
ret = (1-Z) * N + (Z * hx)
I tested with some random data but there is difference between pytorch’s original GRU Cell.
# h_gru_cell_torch is output of torch.GRUCell
# h_gru_cell_custom output of my GRUCell
d_h_gru_cell = tt.sum(tt.abs((h_gru_cell_torch-h_gru_cell_custom)))
d_h_gru_cell=tensor(2.0512e-07)
is this ok or am I doing something wrong?