Understanding lstm based autoencoder input dimensions

Hi, I am currently trying to reconstruct multivariate time series data with lstm-based autoencoder.

The problem is that I get confused with terms in pytorch doc.

In this reference, I care about only three terms. (batch size, sequence length, input size)

I need a simple and solid example to understand.

timestamp sensor1 sensor2 sensor3
0 0.1 0.8 0.2
1 0.2 0.5 0.7
2 0.3 0.1 0.4
9 1.0 0.0 0.5
10 0.3 0.1 0.4
99 1.0 0.0 0.5

Let’s say we have N features and M data points. In my example, N is 3 and M is 100

As far as I know, in the context of pytorch, I am sure that input size means the number of variables or features.

Okay, fine. Then, what is the M(length of time) here? Is it batch size or sequence length?
I thought that M is same as sequence length.

However I am not sure M is same as sequence length. It seems more like batch size.

Furthurmore, what if I use window of size K?

In the above example, I divide data into 10 pieces with window size 10.
Like this, [0~9], [10~19], …, [90~99]. Then what is the batch size and sequece length in this case?

Thx!

BTW, for those who are struggle with the same issue(lstm autoencoder using pytorch), this excellent explain is really helpful.

When you say you divide the data into M/K pieces with window size K, do you then treat each of those M/K pieces as a separate datapoint? So window 1 is an input that yields some output, separately window 2 is an input that yields some output, etc.

In that case:

• each data point input will be one window in, therefore:
• your input_size is 3 (number of features) as you said
• sequence length (L from the pytorch documentation) is the window length, K in your case
• batch size is M/K, assuming you want to pass your entire data in at once to calculate the gradient, and assuming you pass non-overlapping windows (so you pass M/K windows total)

Let me know if this makes sense. Hope it helps!

First of all, Thx for the replying!

If window size(K) is 1, I would feed first layer sequentialy from timestamp 0 to 99.

And I am trying to build LSTM autoencoder(reconstruction problem), so that output length should be same as input length.

In other words, if I choose window size 1, the output length is 1. If I choose window size K, output length should be K. It’s like many input, many output.

Also, I don’t like to pass all data in at once. (my real data is over a million time series)

And I’d like to know what if I make overlapping windows. (like stride=1 sliding window)

The link shows LSTM autoencoder with feature=1, batch size=1, and sequence length=5.

I only changed the feature=3. Here is the code I tested.

``````
import torch
import torch.nn as nn
import torch.optim as optim

class LSTM(nn.Module):
# input_dim has to be size after flattening
# For 20x20 single input it would be 400
def __init__(
self,
input_dimensionality: int,
input_dim: int,
latent_dim: int,
num_layers: int,
):
super(LSTM, self).__init__()
self.input_dimensionality: int = input_dimensionality
self.input_dim: int = input_dim  # It is 1d, remember
self.latent_dim: int = latent_dim
self.num_layers: int = num_layers
self.encoder = torch.nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)
# You can have any latent dim you want, just output has to be exact same size as input
# In this case, only encoder and decoder, it has to be input_dim though
self.decoder = torch.nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

def forward(self, input):
# Save original size first:
original_shape = input.shape
# Flatten 2d (or 3d or however many you specified in constructor)
input = input.reshape(input.shape[: -self.input_dimensionality] + (-1,))

# Rest goes as in my previous answer
_, (last_hidden, _) = self.encoder(input)
print(last_hidden.size())
encoded = last_hidden.repeat(input.shape[0], 1, 1)
print(encoded.size())
y, _ = self.decoder(encoded)
print(y.size())

# You have to reshape output to what the original was
reshaped_y = y.reshape(original_shape)
print(torch.squeeze(reshaped_y).size())

model = LSTM(input_dimensionality= 1, input_dim=3, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()

y = torch.rand(4, 3)
x = y.view(len(y), 1, -1)
print(y)
print(x)

while True:
y_pred = model(x)

loss = loss_function(y_pred, y)

loss.backward()
optimizer.step()
print(y_pred)
``````
``````print(y)
tensor([[0.3396, 0.4584, 0.4990],
[0.3473, 0.6069, 0.3537],
[0.0018, 0.9779, 0.7858],
[0.5669, 0.7047, 0.7538]])

print(y_pred)
tensor([[0.3681, 0.4199, 0.5355],
[0.3092, 0.6824, 0.3382],
[0.0140, 0.8855, 0.7361],
``````

You can see the results are pretty good after several epochs.

Plus, I changed batch size=2 from the above code.(input shape = (4, 2, 3)).
The results are also good.

``````print(y)
tensor([[[0.1876, 0.2843, 0.4372],
[0.2946, 0.3854, 0.6544]],

[[0.7341, 0.2434, 0.2316],
[0.9146, 0.9394, 0.1762]],

[[0.7286, 0.0147, 0.5048],
[0.0909, 0.4264, 0.3136]],

[[0.3353, 0.7569, 0.5305],
[0.2900, 0.4721, 0.9971]]])

print(y_pred)
tensor([[[0.1760, 0.1595, 0.4767],
[0.3121, 0.3768, 0.7416]],

[[0.7191, 0.0896, 0.3143],
[0.8565, 0.9248, 0.1792]],

[[0.7516, 0.2602, 0.4180],
[0.0731, 0.4455, 0.3333]],

[[0.3654, 0.6303, 0.5079],
``````

The issue I had is that I don’t like the order so I changed it with using batch_first argument.
This is why I ask about the input dimensions. If I can make my data into these shape and order somehow, game is over. However, I don’t understand fully so I just changed the order.
Whenever I change the batch size > 1 and change the default order, the results tend to be the average.

``````

class LSTM(nn.Module):
# input_dim has to be size after flattening
# For 20x20 single input it would be 400
def __init__(
self,
input_dimensionality: int,
input_dim: int,
latent_dim: int,
num_layers: int,
):
super(LSTM, self).__init__()
self.input_dimensionality: int = input_dimensionality
self.input_dim: int = input_dim  # It is 1d, remember
self.latent_dim: int = latent_dim
self.num_layers: int = num_layers
self.encoder = torch.nn.LSTM(self.input_dim, self.latent_dim, self.num_layers, batch_first=True)
# You can have any latent dim you want, just output has to be exact same size as input
# In this case, only encoder and decoder, it has to be input_dim though
self.decoder = torch.nn.LSTM(self.latent_dim, self.input_dim, self.num_layers, batch_first=True)

def forward(self, input):
# Save original size first:
original_shape = input.shape
# Flatten 2d (or 3d or however many you specified in constructor)
input = input.reshape(input.shape[: -self.input_dimensionality] + (-1,))

# Rest goes as in my previous answer
_, (last_hidden, _) = self.encoder(input)
print(last_hidden.size())
encoded = last_hidden.repeat(input.shape[1], 1, 1)
print(encoded.size())
y, _ = self.decoder(encoded)
print(y.size())

# You have to reshape output to what the original was
reshaped_y = y.reshape(original_shape)
return reshaped_y

# print(torch.squeeze(reshaped_y).size())

model = LSTM(input_dimensionality= 1, input_dim=3, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()

y = torch.rand(2, 4, 3)
x = y
print(y)

while True:
y_pred = model(x)

loss = loss_function(y_pred, y)
loss.backward()
optimizer.step()
print(y_pred)
``````

Here is the result.

``````print(y)
tensor([[[0.5823, 0.5620, 0.2612],
[0.2795, 0.0694, 0.2277],
[0.5405, 0.3568, 0.6255],
[0.3243, 0.2372, 0.3427]],

[[0.7686, 0.5203, 0.9025],
[0.7947, 0.8480, 0.8990],
[0.7740, 0.1220, 0.5202],
[0.4073, 0.6483, 0.2592]]])

print(y_pred)
tensor([[[0.6665, 0.3905, 0.5777],
[0.4519, 0.4510, 0.4332],
[0.6665, 0.3905, 0.5777],
[0.4519, 0.4510, 0.4332]],

[[0.6665, 0.3905, 0.5777],
[0.4519, 0.4510, 0.4332],
[0.6665, 0.3905, 0.5777],
``````

I really need to figure out how should I make my data into (sequence length, batch size, number of features) or (batch size, sequence length, number of features)

Thx again!!

Also, I don’t like to pass all data in at once. (my real data is over a million time series)

Absolutely, you don’t need to pass all the data in at once. But however many windows of length K you pass, that is your batch size (they can be overlapping or not). So maybe you have B windows, each of length K. In that case your batch size would be B and sequence length K.

And I’d like to know what if I make overlapping windows. (like stride=1 sliding window)

Totally fine. I would recommend you shuffle your inputs such that each batch includes mostly non-overlapping windows, since the information content in overlapping windows is redundant (so two very overlapping windows are more like 1 datapoint rather than 2 datapoints).

If I can make my data into these shape and order somehow, game is over.

For manipulating dimensions, permute() is your friend, as an example:

``````tensor_original = torch.arange(4 * 2 * 3).reshape(4, 2, 3)
tensor_permuted = tensor_original.permute([1, 0, 2])
# this makes the following changes:
# old dimension 1 is moved to new dimension 0
# old dimension 0 is moved to new dimension 1
# old dimension 2 stays as dimension 2

print(tensor_original.shape)
print(tensor_permuted.shape)

torch.Size([4, 2, 3])
torch.Size([2, 4, 3])
``````
``````print(tensor_original)
print(tensor_permuted)

tensor([[[ 0,  1,  2],
[ 3,  4,  5]],

[[ 6,  7,  8],
[ 9, 10, 11]],

[[12, 13, 14],
[15, 16, 17]],

[[18, 19, 20],
[21, 22, 23]]])

tensor([[[ 0,  1,  2],
[ 6,  7,  8],
[12, 13, 14],
[18, 19, 20]],

[[ 3,  4,  5],
[ 9, 10, 11],
[15, 16, 17],
[21, 22, 23]]])
``````
1 Like

Thanks for clarifying things!!