Hello,
I am implementing approaches to model transfer behavior of a device feeding sequential input data with temporal dependencies. Therefore, the obvious approach are RNNs, more specifically I am using LSTM, GRU and JANET, and it works pretty good as expected. Now I have a couple of questions which came up while comparing the performance of these approaches.
-
I have the exact same implementation for LSTM and GRU, i.e., same data procedure, exact same architecture except using
nn.GRU()
instead ofnn.LSTM()
, and exact same training, validation etc…
I do the training using input dimensions of(5, 100, 60)
where(batch_size, sequence_length, input_size)
as described in LSTM — PyTorch 2.0 documentation .
For validation i want to feed my whole sequential input data (not split up into sequences) of length300e3
. While this works very well using the LSTM, it seems to be extremely slow for the GRU.
In numbers, the LSTM takes e.g., 48 seconds for 5 epochs while the GRU takes 12 minutes! Especially the part where i feed a long input to the GRU, it becomes very slow. The training speed itself is comparable to the LSTM.
There is literally not a single line of code different from the LSTM except for the initialization of the network as mentioned before. Maybe someone has an idea what the problem could be, and i am happy to provide code if needed. -
This is more a general question on data preparation. The NN should model the transfer behavior of a device that introduces nonlinearity and amplification to the input stream, meaning the relation between input data and output data (which should be modeled by the NN) is e.g.,
x_in = [0.2, 0.3, 0.5]
x_out = [0.4, 0.57, 0.94]
These are just example values and the range of the numbers can be very large (very small to large). Now i was wondering if i could help the network to learn the nonlinear behavior by doing normalization or something similar, but i am not sure about this thought, and would be happy for advice.
Thanks in advance!