Hi guys, I am working with high-dimensional time series and I have built an auto encoder (LSTM-LSTM) and I am now trying to use the hidden states of the encoder for another task (clusterization) after training the entire encoder-decoder. However, it turned out that all the hidden states are very very similar:
# last hidden state of the encoder
tensor([[[ 0.1086065620, -0.0446619801, -0.0530930459, ...,
-0.0573375113, 0.1083261892, 0.0037083717],
[ 0.1086065620, -0.0446619801, -0.0530930459, ...,
-0.0573375151, 0.1083261892, 0.0037083712],
[ 0.1086065620, -0.0446619801, -0.0530930459, ...,
-0.0573375188, 0.1083262041, 0.0037083719],
...,
[ 0.1086065620, -0.0446619801, -0.0530930422, ...,
-0.0573375151, 0.1083262041, 0.0037083724],
[ 0.1086065620, -0.0446619801, -0.0530930385, ...,
-0.0573375151, 0.1083262041, 0.0037083712],
[ 0.1086065620, -0.0446619801, -0.0530930385, ...,
-0.0573375188, 0.1083261892, 0.0037083707]]],
grad_fn=<StackBackward>)
I have scaled the input (using sklearn.preprocessing.StandardScaler
).
It might not be a big deal but then I use these hidden states with a KMeans algorithm and it leads to strange behaviours… Does anyone have a suggestion?
Here is the code of my encoder (which contains an attention mechanism to select relevant driving series)
class Encoder(nn.Module):
def __init__(self, config, input_size: int):
super(Encoder, self).__init__()
self.input_size = input_size
self.hidden_size = config['hidden_size_encoder']
self.seq_len = config['seq_len']
self.lstm = nn.LSTM(
input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=1
)
self.attn = nn.Linear(
in_features=2 * self.hidden_size + self.seq_len,
out_features=1
)
self.dropout = nn.Dropout(p=0.5)
self.softmax = nn.Softmax(dim=1)
def forward(self, input_data):
h_t, c_t = (init_hidden(input_data, self.hidden_size), init_hidden(input_data, self.hidden_size))
input_weighted = Variable(torch.zeros(input_data.size(0), self.seq_len, self.input_size))
for t in range(self.seq_len):
x = torch.cat((h_t.repeat(self.input_size, 1, 1).permute(1, 0, 2),
c_t.repeat(self.input_size, 1, 1).permute(1, 0, 2),
input_data.permute(0, 2, 1).to(device)), dim=2).to(
device)
e_t = self.attn(x.view(-1, self.hidden_size * 2 + self.seq_len))
a_t = self.dropout(self.softmax(e_t.view(-1, self.input_size)))
weighted_input = torch.mul(a_t, input_data[:, t, :].to(device))
self.lstm.flatten_parameters()
_, (h_t, c_t) = self.lstm(weighted_input.unsqueeze(0), (h_t, c_t))
input_weighted[:, t, :] = weighted_input
return input_weighted[:, -1:, :], h_t, c_t
I hope this is clear… Thanks for your help !
EDIT: I also tried to rescale the output of the encoder before feeding into KMeans and the results are really different…