LSTM for time-series with Batches

jagoul · January 18, 2020, 11:21pm

I am trying to create an LSTM based model to deal with time-series data (nearly a million rows). I created my train and test set and transformed the shapes of my tensors between sequence and labels as follows :

seq shape : torch.Size([1024, 1, 1])
labels shape : torch.Size([1024, 1, 1])
train_window =1 (one time step at a time)

Obviously my batch size as indicated in the shape is 1024. and I then I built my LSTM class

class LSTM(nn.Module):

    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(LSTM, self).__init__()

        self.num_classes = num_classes
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.seq_length = train_window

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)

        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, input):
        print('input :{}'.format(input.shape))
        batch_size = input.shape[0]

        hidden_state = Variable(torch.zeros(
            self.num_layers, input.size(0), self.hidden_size))

        cell_state = Variable(torch.zeros(
            self.num_layers, input.size(0), self.hidden_size))

        # Propagate input through LSTM
        ula, (output, _) = self.lstm(input, (hidden_state, cell_state))
        output = output.view(-1, self.hidden_size)

        out = self.dropout(out)
        output = self.fc(output)

        return output

but when I started training using the follwing code:

model = LSTM(num_classes=1, input_size=1, hidden_size=100, num_layers=1)
for epoch in range(1, EPOCHS + 1):
    # Train on the training data in a federated way
    train(model, device, federated_train_loader, optimizer, epoch)

#inside train()
...
with torch.no_grad():
        for batch_idx, (seq, labels) in enumerate(federated_test_loader):
            # Send the model to the right gateway
            model.send(seq.location)
            # Move the data and target labels to the device (cpu/gpu) for computation
            seq, labels = seq.to(device), labels.to(device)
            # Make a prediction
            output = model(seq)

I got an error , probaly related to my batch sizes. I tried playing around with the shapes , also changing the LSTM class but I was unable to detect the error. could you please help:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

<ipython-input-35-7e1b9e7fe906> in train(model, device, federated_train_loader, optimizer, epoch)
     13         print('seq shape : {}'.format(seq.shape))
     14         print('labels shape : {}'.format(labels.shape))
---> 15         output = model(seq)
     16         # Calculate huber loss for regression problems
     17         labels =labels.view(-1)

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

<ipython-input-32-598be3037a3e> in forward(self, input)
     26 
     27         # Propagate input through LSTM
---> 28         ula, (output, _) = self.lstm(input, (hidden_state, cell_state))
     29         output = output.view(-1, self.hidden_size)
     30 

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    562             return self.forward_packed(input, hx)
    563         else:
--> 564             return self.forward_tensor(input, hx)
    565 
    566 

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_tensor(self, input, hx)
    541         unsorted_indices = None
    542 
--> 543         output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
    544 
    545         return output, self.permute_hidden(hidden, unsorted_indices)

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
    521             hx = self.permute_hidden(hx, sorted_indices)
    522 
--> 523         self.check_forward_args(input, hx, batch_sizes)
    524         if batch_sizes is None:
    525             result = _VF.lstm(input, hx, self._get_flat_weights(), self.bias, self.num_layers,

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
    494     def check_forward_args(self, input, hidden, batch_sizes):
    495         # type: (Tensor, Tuple[Tensor, Tensor], Optional[Tensor]) -> None
--> 496         self.check_input(input, batch_sizes)
    497         expected_hidden_size = self.get_expected_hidden_size(input, batch_sizes)
    498 

~/anaconda3/envs/ftorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
    147             raise RuntimeError(
    148                 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 149                     self.input_size, input.size(-1)))
    150 
    151     def get_expected_hidden_size(self, input, batch_sizes):

RuntimeError: input.size(-1) must be equal to input_size. Expected 1, got 0

ptrblck · January 20, 2020, 1:03am

Could you print the shapes of all tensors you are passing to your self.lstm before this error is raised?

raed · January 20, 2020, 1:21am

I printed the shapes at the beginning of my post for more information. My input is the sequence shape which is [1024,1,1]

jagoul · January 20, 2020, 3:57pm

The input size inside my LSTM model inside the forward pass looks like this :

input :torch.Size([1024, 1, 1])

I think I have a problem dealing with batch, for more information I included my training function:

def train(model, device, federated_train_loader, optimizer, epoch):
    model.train()
    # Iterate through each gateway's dataset
    for idx, (seq, labels) in enumerate(federated_train_loader):
        batch_idx = idx+1
        # Send the model to the right gateway
        model.send(seq.location)
        # Move the data and target labels to the device (cpu/gpu) for computation
        seq, labels = seq.to(device), labels.to(device)
        # Clear previous gradients (if they exist)
        optimizer.zero_grad()
        # Make a prediction
        print('seq shape : {}'.format(seq.shape))
        print('labels shape : {}'.format(labels.shape))
        output = model(seq)
        # Calculate huber loss for regression problems
        #labels =labels.view(-1)
        #seq = seq.view(-1)
        #labels = labels.unsqueeze(1)
        #labels = labels.float()
        loss = loss_function(output, labels)
        # Calculate the gradients
        loss.backward()
        # Update the model weights
        optimizer.step()
        # Get the model back from the gateway
        #model.get()
        if batch_idx==len(federated_train_loader) or (batch_idx!=0 and batch_idx % LOG_INTERVAL == 0):
            # get the loss back
            loss = loss.get()
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * BATCH_SIZE, len(federated_train_loader) * BATCH_SIZE,
                100. * batch_idx / len(federated_train_loader), loss.item()))

G.M · January 21, 2020, 1:05pm

where did u change the shape of the “labels”, i expect “labels” to have only 2 dimension for most loss functions(perhaps I have a wrong understanding of the code ).

jagoul · January 21, 2020, 1:34pm

This is how i pre-processed the time-series data

test_data_size = 150000

X_train = data[:-test_data_size]
X_test = data[-test_data_size:]
print('X_train shape :{}'.format(X_train.shape))
print('X_test shape : {}'.format(X_test.shape))

#Transform and normalize X_train
scaler = MinMaxScaler(feature_range=(-1, 1))
X_train_normalized = scaler.fit_transform(X_train.reshape(-1, 1))
X_train_tensor = torch.FloatTensor(X_train_normalized).view(-1)
print('X_train_tensor shape {}'.format(X_train_tensor.shape))

# In our dataset it is convenient to use a sequence length of 30 min
# since we have data by minute send by IoT devices to the cloud
train_window = 1

#Create the sequence for the training dataset
def create_train_sequences(input_data, tw):
    seq = []
    labels = []
    L = len(input_data)
    for i in range(L-tw):
        train_seq = input_data[i:i+tw]
        train_label = input_data[i+tw:i+tw+1]
        seq.append((train_seq))
        labels.append((train_label))

    return seq, labels

train_seq, train_labels = create_train_sequences(X_train, train_window)
train_seq = torch.FloatTensor(train_seq)
train_labels = torch.FloatTensor(train_labels)
print('train_seq shape: {}'.format(train_seq.shape))
print('train_labels shape : {}'.format(train_labels.shape))
#print('train_seq shape: {}'.format(len(train_seq)))
#print('train_labels shape : {}'.format(len(train_labels)))

def create_test_sequences(input_data, tw):
    seq = []
    labels = []
    L = len(input_data)
    for i in range(L-tw):
        test_seq = input_data[i:i+tw]
        test_label = input_data[i+tw:i+tw+1]
        seq.append((test_seq))
        labels.append((test_label))
    
    return seq, labels

test_seq, test_labels = create_test_sequences(X_test, train_window)
test_seq = torch.FloatTensor(test_seq)
test_labels = torch.FloatTensor(test_labels)
print('test_seq shape: {}'.format(test_seq.shape))
print('test_labels shape: {}'.format(test_labels.shape))

X_train shape :(815943,)
X_test shape : (150000,)
X_train_tensor shape torch.Size([815943])
train_seq shape: torch.Size([815942, 1])
train_labels shape : torch.Size([815942, 1])
test_seq shape: torch.Size([149999, 1])
test_labels shape: torch.Size([149999, 1])

And this is how I federated to feed it to the LSTM network, the rest of the training code was post it above :

BATCH_SIZE = 1024
# Create pytorch tensor from X_train,X_test
train_inputs = train_seq.clone().detach().unsqueeze(-1)
train_labels = train_labels.clone().detach()

#train_inputs = train_seq.clone().detach().requires_grad_(True)
#train_labels = train_labels.clone().detach().requires_grad_(True)

#train_inputs = torch.tensor(train_seq,dtype=torch.float).tag("#seq")
#train_labels = torch.tensor(train_labels, dtype=torch.float).tag("#label")

print('train_inputs shape : {}'.format(train_inputs.shape))
print('train_labels shape : {}'.format(train_labels.shape))

test_inputs = test_seq.clone().detach().unsqueeze(-1)
test_labels = test_labels.clone().detach()

#train_inputs = test_inputs.clone().detach().requires_grad_(True)
#test_labels = test_labels.clone().detach().requires_grad_(True)

#test_inputs = torch.tensor(test_seq,dtype=torch.float).tag("#seq")
#test_labels = torch.tensor(test_labels, dtype=torch.float).tag("#label")
print('test_inputs shape : {}'.format(test_inputs.shape))
print('test_labels shape : {}'.format(test_labels.shape))

# Send the training and test data to the gatways in equal proportion.
train_idx = int(len(X_train)/2)
test_idx = int(len(X_test)/2)
gatway1_train_dataset = sy.BaseDataset(train_inputs[:train_idx], train_inputs[:train_idx]).send(gatway1)
gatway2_train_dataset = sy.BaseDataset(train_inputs[train_idx:], train_inputs[train_idx:]).send(gatway2)
gatway1_test_dataset = sy.BaseDataset(test_inputs[:test_idx], test_inputs[:test_idx]).send(gatway1)
gatway2_test_dataset = sy.BaseDataset(test_inputs[test_idx:], test_inputs[test_idx:]).send(gatway2)
print('gatway1_train_dataset : {}'.format(gatway1_train_dataset))
print('gatway2_train_dataset : {}'.format(gatway2_train_dataset))

# Create federated datasets, an extension of Pytorch TensorDataset class
federated_train_dataset = sy.FederatedDataset([gatway1_train_dataset, gatway2_train_dataset])
federated_test_dataset = sy.FederatedDataset([gatway1_test_dataset, gatway2_test_dataset])
print('federated_train_dataset : {}'.format(federated_train_dataset))
print('federated_test_dataset : {}'.format(federated_test_dataset))

# Create federated dataloaders, an extension of Pytorch DataLoader class
federated_train_loader = sy.FederatedDataLoader(federated_train_dataset, shuffle=True, batch_size=BATCH_SIZE)
federated_test_loader = sy.FederatedDataLoader(federated_test_dataset, shuffle=False, batch_size=BATCH_SIZE)
print('federated_train_loader : {}'.format(federated_train_loader))
print('federated_test_loader : {}'.format(federated_test_loader))

I printed out the last shapes for my network before the training as follow :

train_inputs shape : torch.Size([815942, 1, 1])
train_labels shape : torch.Size([815942, 1])
test_inputs shape : torch.Size([149999, 1, 1])
test_labels shape : torch.Size([149999, 1])
gatway1_train_dataset : <syft.frameworks.torch.federated.dataset.BaseDataset object at 0x159aafd10>
gatway2_train_dataset : <syft.frameworks.torch.federated.dataset.BaseDataset object at 0x15a82e350>
federated_train_dataset : FederatedDataset
    Distributed accross: gatway1, gatway2
    Number of datapoints: 815942

federated_test_dataset : FederatedDataset
    Distributed accross: gatway1, gatway2
    Number of datapoints: 149999

federated_train_loader : <syft.frameworks.torch.federated.dataloader.FederatedDataLoader object at 0x159ab02d0>
federated_test_loader : <syft.frameworks.torch.federated.dataloader.FederatedDataLoader object at 0x15a82e310>

Could anyone please tell me what is going on with this LSTM and what I am doing wrong in shaping the tensors? Much appreciated

jagoul · January 21, 2020, 1:46pm

If Unsqueeze the tensor as follows :

train_inputs = train_seq.clone().detach().unsqueeze(-1)
train_labels = train_labels.clone().detach()

test_inputs = test_seq.clone().detach().unsqueeze(-1)
test_labels = test_labels.clone().detach()

My error changed and the network expected to have a 3-dimensional tensor :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

<ipython-input-44-ea284297e260> in train(model, device, federated_train_loader, optimizer, epoch)
     13         print('seq shape : {}'.format(seq.shape))
     14         print('labels shape : {}'.format(labels.shape))
---> 15         output = model(seq)
     16         # Calculate huber loss for regression problems
     17         #labels =labels.view(-1)

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

<ipython-input-38-499b4cb2002e> in forward(self, input)
     26 
     27         # Propagate input through LSTM
---> 28         ula, (output, _) = self.lstm(input, (hidden_state, cell_state))
     29         output = output.view(-1, self.hidden_size)
     30 

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    557             return self.forward_packed(input, hx)
    558         else:
--> 559             return self.forward_tensor(input, hx)
    560 
    561 

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_tensor(self, input, hx)
    537         unsorted_indices = None
    538 
--> 539         output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
    540 
    541         return output, self.permute_hidden(hidden, unsorted_indices)

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
    517             hx = self.permute_hidden(hx, sorted_indices)
    518 
--> 519         self.check_forward_args(input, hx, batch_sizes)
    520         if batch_sizes is None:
    521             result = _VF.lstm(input, hx, self._get_flat_weights(), self.bias, self.num_layers,

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
    488     def check_forward_args(self, input, hidden, batch_sizes):
    489         # type: (Tensor, Tuple[Tensor, Tensor], Optional[Tensor]) -> None
--> 490         self.check_input(input, batch_sizes)
    491         expected_hidden_size = self.get_expected_hidden_size(input, batch_sizes)
    492 

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
    147             raise RuntimeError(
    148                 'input must have {} dimensions, got {}'.format(
--> 149                     expected_input_dim, input.dim()))
    150         if self.input_size != input.size(-1):
    151             raise RuntimeError(

RuntimeError: input must have 3 dimensions, got 2

G.M · January 21, 2020, 1:54pm

There’s a lot of code and it’s taking quite some time to understand it all. But in this trace you posted, certainly some inputs to the lstm only have 2 dimensions, try to print the shapes of all the tensors that u fed into the lstm.

jagoul · January 21, 2020, 1:59pm

I already printed all the shapes of my dataset , here is the input inside the lstm forward pass :

input inside forward pass :torch.Size([1024, 1, 1])

jagoul · January 21, 2020, 2:00pm

it is the same as of my seq shape and I don’t see anything wrong about it

G.M · January 21, 2020, 2:29pm

Well I can’t see anything wrong either, sorry. But one thing I suggest u to do is that both the cell state and hidden state are zeroes by default, so no need to set them manually. Also, u shouldn’t be using Variable anymore and that the states should be tensors.

jagoul · January 21, 2020, 2:51pm

Thank you @G.M for your swift reply. but this doesn’t really solve my problem . anyone can help me please ?

jagoul · January 22, 2020, 2:20pm

I’m also using pysyft to federate the data before feeding it to LSTM. Could it be a main reason why the model couldn’t handle the shapes on different clients?

bfeeny · June 27, 2020, 10:31am

jagoul:

# Propagate input through LSTM
        ula, (output, _) = self.lstm(input, (hidden_state, cell_state))
        output = output.view(-1, self.hidden_size)

        out = self.dropout(out)
        output = self.fc(output)

        return output

I realize this is an old thread, but I stumbled on it and was wondering why are you taking what you call “output” above, and sending it through the network. It looks like “output” is really h_n (the hidden state, don’t you really want to grab what you are calling “ula”?

harsha_g · June 27, 2020, 3:44pm

@bfeeny If you see the documentation for LSTM or the documented comments in this tutorial you will notice that ula is all hidden states whereas the output is just the last hidden state. The OP is doing a many-to-one model.

bfeeny · June 28, 2020, 12:17am

@harsha_g thanks, I am trying to understand it all. I am using this link as a reference https://towardsdatascience.com/pytorch-basics-how-to-train-your-neural-net-intro-to-rnn-cb6ebc594677

So are you saying that with many to one, you would take the hidden output and pass it to your linear layer? This is the output of each of the batches of the last time step.

What then would ula in the above code be used for? When or what are you passing that to?

harsha_g · June 28, 2020, 2:28am

Yes.

Precisely.

When you are doing a many-to-many classification or when you are are doing seq2seq modeling (aka encoder-decoder architectures) more specifically, neural machine translation.

The 4th figure in this tutorial demonstrates how and when all the hidden states are used.

bfeeny · June 28, 2020, 3:27am

Thanks @harsha_g this is helpful. So if you are going from an LSTM with an input_size of say 30,000 into a single Linear layer, that is going to be a many-to-one. Even if later you come out of that Linear layer into say 30,000 features, from an LSTM perspective, its considered “many-to-one” correct?

So if you were to do “many to many” where you would actually use the output and not just h_n, then what you would need to do, is iterate over output, and setup say for example a Linear output (or some other output type_ for each of the timesteps that exist inside of the output. Does that sound right?

A problem I working with now I am taking 30495 features and trying to predict 30490 features, I thought that was “many to many”, but I built my model as LSTM->single FC layer, so I guess thats really many-to-one. It would seem Seq2Seq is a better model, but I just don’t know much about those and have found very little on just regular many to many LSTMs (examples, etc) out there.

harsha_g · June 28, 2020, 3:28am

I just answered you other post here Need help racking my brain on batch_size. Please refer to that and see if that helps.

harsha_g · June 28, 2020, 4:21am

Please look at figure 1 in this article to correct your understanding of what many-to-many/many-to-one exactly means.

Right on the money.

Although I don’t understand your task fully, I can tell that you want to build a many-to-many LSTM model.