I am implementing approaches to model transfer behavior of a device feeding sequential input data with temporal dependencies. Therefore, the obvious approach are RNNs, more specifically I am using LSTM, GRU and JANET, and it works pretty good as expected. Now I have a couple of questions which came up while comparing the performance of these approaches.
I have the exact same implementation for LSTM and GRU, i.e., same data procedure, exact same architecture except using
nn.LSTM(), and exact same training, validation etc…
I do the training using input dimensions of
(5, 100, 60)where
(batch_size, sequence_length, input_size)as described in LSTM — PyTorch 2.0 documentation .
For validation i want to feed my whole sequential input data (not split up into sequences) of length
300e3. While this works very well using the LSTM, it seems to be extremely slow for the GRU.
In numbers, the LSTM takes e.g., 48 seconds for 5 epochs while the GRU takes 12 minutes! Especially the part where i feed a long input to the GRU, it becomes very slow. The training speed itself is comparable to the LSTM.
There is literally not a single line of code different from the LSTM except for the initialization of the network as mentioned before. Maybe someone has an idea what the problem could be, and i am happy to provide code if needed.
This is more a general question on data preparation. The NN should model the transfer behavior of a device that introduces nonlinearity and amplification to the input stream, meaning the relation between input data and output data (which should be modeled by the NN) is e.g.,
x_in = [0.2, 0.3, 0.5] x_out = [0.4, 0.57, 0.94]
These are just example values and the range of the numbers can be very large (very small to large). Now i was wondering if i could help the network to learn the nonlinear behavior by doing normalization or something similar, but i am not sure about this thought, and would be happy for advice.
Thanks in advance!