How to define Dataset for 1DCNN with 2D input

chadson · July 31, 2020, 2:12am

I am working on a Time-series classification task. The dataset has 14 features with float values between [0:1] and the label is an integer value, which makes it a multivariate time-series data.
Now I am developing a simple 1D-CNN model followed by fully-connected layers for the classification.

For loading the time-series data for training, I defined MTSDataSet that inherits Dataset class and overwrote the necessary methods as below:

class MTSDataSet(Dataset):
    def __init__(self, data_df):
        self.data = data_df
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        x = np.transpose(torch.from_numpy(self.data.iloc[index:index+sig_length, :-1].values))
        y = self.data.iloc[index, -1]

        return x, y

In getitem method, I used iloc[index:index+sig_length], where sig_length = 60, to use an input of size 14 (number of features) x 60 (signal length) for the 1D CNN model. Below is how I defined the DataLoader for the training data:

train_loader = DataLoader(train_dataset, batch_size = batch_size, shuffle=True)

And the code below is the definition of the 1D-CNN model followed by fully-connected layers:

class Net0(nn.Module):
    def __init__(self):
        super(Net0, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=14, out_channels=7, kernel_size=10, stride=1) # IN: 14 x 60, OUT: 7 x 51
        self.conv2 = nn.Conv1d(in_channels=7, out_channels=4, kernel_size=10, stride=1) # IN: 7 x 51, OUT: 4 x 42

        self.fc1 = nn.Linear(4*42, 32)
        self.fc2 = nn.Linear(32,len(act_map.keys()))
        
    def forward(self, x):
        out = F.relu(self.conv1(x.float()))
        out = F.relu(self.conv2(out))
        
        out = out.view(-1, 4*42) 
        out = F.relu(self.fc1(out))
        out = self.fc2(out)
        return out

The input data walks through the model without a problem, but I get a RuntimeError whenever I run the training loop:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-132-d7fbe87b310d> in <module>()
      5 optimizer = optim.Adam(model.parameters(), lr=lr)
      6 loss_fn = nn.CrossEntropyLoss()
----> 7 training_loop(n_epochs = n_epochs, optimizer = optimizer, model = model, loss_fn = loss_fn, train_loader = train_loader)

6 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: stack expects each tensor to be equal size, but got [14, 60] at entry 0 and [14, 26] at entry 15

I think this error is raised because of self.data.iloc[index:index+sig_length, :-1] in the getitem method of my MTSDataset class. Whenever the range [index:index+sig_length] does not include as many data points as sig_length, I would get the error. But I’m not sure how to refine my Dataset or DataLoader definitions to guarantee that the loaded data contains the required number of data points.

Is there any effective or standard way to define a Dataset and DataLoader for 1D-CNN with 2D input data, which may solve the RuntimeError that I’ve got?

Blizzard_boi2020 · July 31, 2020, 5:14am

Can you just explain the problem you’re facing?

chadson · July 31, 2020, 1:40pm

The problem is:

My 1D-CNN model is supposed to deal with multivariate time-series data with signal length 60.
getitem method in my Dataset class returns data of range [index, index+60]
getitem method returns a data with a shorter signal length than 60 when the index is close to the tail of the data (i.e., index+60 > dataset.length) <-- this is my guess on the cause of the RuntimeError
So this tases RuntimeError that I gave above in the question.

My question is:

Is there any effective or standard way to define a Dataset and DataLoader for 1D-CNN with 2D input data, which may solve the RuntimeError that I’ve got?

chadson · July 31, 2020, 2:06pm

It can be an option to compose a single data of 60 rows to make each returned item with a signal length 60. but I’m not sure if this is the right way because I need to abandon some data samples in the tail. So I wonder if there is any standard-ish way to define the Dataset for 1D CNN with 2D input data.

Blizzard_boi2020 · July 31, 2020, 7:42pm

I appreciate your insights, but why are you using 1D cnn? We have 2D cnn, that will work better with 2D images, right! Else you need to reshape the data to make 1D. Like sqeeze fn in PyTorch

chadson · July 31, 2020, 7:47pm

As I mentioned in the beginning, my data is not 2D images, but it’s multivariate time-series data. After reading many research papers, I found that 1D CNN has been a successful reference model for time-series classification tasks. That’s why I’m trying 1D CNN for my task. FYI, I’ve tried 2D CNN as well, by reshaping my data to a similar shape of 2D images, but it didn’t go well. Before designing more complicated structure such as hierarchical models or multi-scale models, I wanted to see if 1D CNN can be a better option.