Changing Labels of Dataset in Pytorch for Continual Learning Setting

Akash_Tadwai · March 10, 2021, 6:00pm

We have CICIDS17 dataset that consists of 15 classes (1 Normal and 14 Attack labels). We are working in Continual Learning Setup in which we need to divide the data into a sequence of tasks (train and validation) i.e for example as we have 15 classes we used pd.factorize from pandas and converted the object labels into integer labels. For taskwise splitting the dataset, we looked at these integer labels and split the data with the task name added. The flow of our preprocessing steps is as follows,

Original Dataset → converting to torch.dataset.Dataset with 15 labels → Caching Dataset → Splitting Dataset based on Tasks.

Splitting Dataset based on Tasks consists of 2 steps:

Subclass (A dataset wrapper that returns the task name and removes the offset of labels (Let the labels start from 0) )
Appendname (Appends the name of task with the dataset)

Now, for our experiment we want the dataset to have the labels as 0 and 1 only (signifying whether the label is a normal class or attack class but the order should be maintained while creating tasks, i.e, in each task we are fitting exactly 3 different classes with each of its labels).

How to achieve this?

The code we have tried on 15 labels for 5 tasks is in this. Following the SplitGen function in the code would give a better idea of what we are trying to do.

We want to change the target labels for each task to either only 0 or 1. How to achieve this.

Akash_Tadwai · March 10, 2021, 6:01pm

As I am a new user. It restricted me to put the links for the dataset and the Continual Learning related info in the original post. So here they go.
CICIDS17 Dataset: CICIDS2017 | Kaggle
Continual Learning Review: https://arxiv.org/pdf/1802.07569.pdf