I am interested in creating my own custom GRU implementation (for example changing the tanh activation to relu), but with the same training efficiency of the torch.nn.GRU class.
I believe I need to implement it as a C++ extension in order to avoid a time-stepping for-loop in Python.
Can anyone point me in the direction of where to start? Ideally, I would base it off the existing torch cpp GRU implementation, but I am struggling to find that in the source code.
Thanks in advance!