Dumping Image data and load using dataloader

_joker · August 6, 2020, 1:38am

I am wanting to dump the data so that I can load it back for training my model.

My code snipped for dumping the data:

for batch_idx, (image, label) in enumerate(dataloader):
    image, label = image.to(device), label.to(device)
    perturbed_image = attack.perturb(image, label)
    #---------- Classifier ----------
    predict_A = classifier(perturbed_image)
    pred_label = torch.max(predict_A.data, 1)[1]
    
    if pred_label != label:
        adv_data.append( (perturbed_image.to("cpu"), label.to("cpu")) )

def load_data(list_):
    if len(list_[0])==2:
        img, lab = [], []
        for i in list_:
            img.append(i[0]), lab.append(i[1])
        xs = torch.stack(img)
        xs = xs.squeeze(1)
        ys = torch.Tensor(lab)
        dataset = TensorDataset(xs, ys)
        del img, lab
    return dataset

with open(dir, "rb") as file:
    train_list = pickle.load(file)
train_set_1 = load_data(train_list)

train_loader = torch.utils.data.DataLoader(train_set_1)

Is there any other way I can dump it correctly so as to load it in the torch.utils.data.DataLoader.

This works well individually but I guess this is not exactly the way how a usual torchvision.dataset.CIFAR10 is stored and loaded. I want to concatenate this with torchvision.datasets.CIFAR10 in my train_loader.

i.e.,
concatenating this;
train_set_2 = torchvision.datasets.CIFAR10(root="data", train=True) with train_set_1 above.

ptrblck · August 8, 2020, 9:50am

To concatenate two Datasets, you could use ConcatDataset. If you want to use shuffling, you would have to make sure both datasets return tensors in the same shape. Otherwise, you should define the lengths and batch_size such that batches do not contain samples from both datasets.

I’m not sure, how you would like to concatenate the datasets, so let me know, if I misunderstood the question.

_joker · August 8, 2020, 6:49pm

@ptrblck You understood it correctly.
I actually found a way, I dump both the datasets (dataset1: CIFAR10, dataset2: CIFAR10 variant) using torch.save and load it back like this;

img = torch.load(filename_images)
lab = torch.load(filename_labels)
        
xs = torch.stack(img)
xs = xs.squeeze(1)
ys = torch.Tensor(lab)
dataset = TensorDataset(xs, ys)

this dataset works perfectly normal in torch.utils.data.DataLoader. And the new concatenated data is already shuffled above before dumping but I do shuffle it again when I load it for training.