Hi, Andre, I read the referred post.
First, the plots is unclear, which operation is performed between the input data x_t and the previous hidden state ? Addition?, dot product?
Second, the author claims:
This property enables LSTMs to process entire sequences of data (e.g. time series) without treating each point in the sequence independently, but rather, retaining useful information about previous data in the sequence to help with the processing of new data points…
According to it if I have a sequence vector of 128 points, x_i = [x_i,1, …, x_i,128], it is processed as a whole (“process entire sequences… without treating each point independently”), is it correct?
I read many papers that say that LSTM is able learn temporal features, i imagine that it really pass around each points in the sequence.
Another doubt is, what is the meaning the “previous data in the sequence”. Does it refer to other sequences?, for instance I have a dataset of x_i samples, i, 1…N, where each sample x_i contains a sequence of 128 samples.
In my test, I change the dimension of x_i, to 256, and LSTM seems invariant to time length, but I dont figure out how each sequence is being processed,
In addition, the classic example of Pytorch of LSTM has zero initial hidden state and cell state (h0, c0) in the forward method, so it seems that the LSTM internally traverse across individual sample point of the given sequence.