How to pad a sequence?

andy29 · November 27, 2020, 1:22am

I am trying to learn about pack_padded_sequence more and want to test it in this small dataset. I managed to merge two tensors of different sequence length but when I try to pad the sequence it gives me an error. Does anybody know how to solve this? I am trying to follow an example given in the stackoverflow comments but with an actual dataset. https://stackoverflow.com/questions/51030782/why-do-we-pack-the-sequences-in-pytorch

Runtime Error: The expanded size of the tensor (8) must match the existing size (4) at non-singleton dimension 1. Target sizes: [93, 8, 1]. Tensor sizes: [93, 4, 1]

!wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.nn as nn
from torch.autograd import Variable
from sklearn.preprocessing import MinMaxScaler

training_set = pd.read_csv('airline-passengers.csv')

def sliding_windows(data, seq_length):
    x = []
    y = []

    for i in range(len(data)-seq_length-1):
        _x = data[i:(i+seq_length)]
        _y = data[i+seq_length]
        x.append(_x)
        y.append(_y)

    return x,np.array(y)

sc = MinMaxScaler()
training_data = sc.fit_transform(training_set)

seq_length = 8
x, y = sliding_windows(training_data, seq_length)

train_size = int(len(y) * 0.67)
test_size = len(y) - train_size

trainX = Variable(torch.Tensor(np.array(x[0:train_size])))
trainY = Variable(torch.Tensor(np.array(y[0:train_size])))

seq_length = 4
x1, y1 = sliding_windows(training_data, seq_length)

train_size = int(len(y1) * 0.67)
test_size = len(y1) - train_size

trainX1 = Variable(torch.Tensor(np.array(x1[0:train_size])))
trainY1 = Variable(torch.Tensor(np.array(y1[0:train_size])))

seq_batch = [trainX,
             trainX1]

seq_lens = [8, 4]

added_seq_batch = torch.nn.utils.rnn.pad_sequence(seq_batch, batch_first=True)