Add data to DataLoader

Zeeky · February 26, 2019, 3:55pm

I have an unusually situation here: I need to add a tensor to trainloader (DataLoader).

Suppose I have the trainloader from the following code

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

And the only thing I can work with is the trainloader.

Now I want to append/add another tensor to the trainloader (e.g., weights for training data), so that I can do

for step, (data, target, weight) in enumerate(new_trainloader):
...

Is it possible to do so?

dhpollack · February 26, 2019, 4:10pm

You add it to the dataset. You should look into the torchvision CIFAR10 dataset. There are dataset concatenators, but I’m not sure those will work for a specific set such as CIFAR10.

justusschock · February 26, 2019, 4:20pm

in addition to what @dhpollack said: Depending on your use case you’d maybe reinstantiate the dataloader because it might have an old value for your dataset length and this might cause wrong behavior in the sampler.

Zeeky · February 26, 2019, 5:01pm

Thanks. But torchvision.datasets has some transform operations. How to work around those is another problem.

I have tried the following:

extract trainloader.dataset.train_data and trainloader.dataset.train_label
reform dataset with my own data (weight tensor).
Data.TensorDataset(train_data, train_label, weight_tensor)
re-define trainloader.

But it won’t keep transform operations. which is causing dimension error during training.

Zeeky · February 26, 2019, 5:04pm

I agree, but transform operations is another problem. How to keep those when re-defining dataloader.

dhpollack · February 27, 2019, 12:39pm

First off, what do you mean by “I want to append/add another tensor to the trainloader (e.g., weights for training data)”. That sounds like you want to load previously trained weights into your model. This has nothing to do with the dataloader or dataset. Is this what you are trying to do?.

You can add the transformations to any dataset. I think appending items to this dataset will be quite difficult, because the dataset itself seems to be a bunch of pickle files. So you’d have to add items to the pickle to make this process repeatable. Going through the process of creating your own dataset based on the official CIFAR10 one would probably be helpful since you are doing more than just using cifar.

Having said that…

you could try to directly access dataset and append your images as numpy arrays and labels as ints by accessing

newimg, newlabel = np.random.rand(1, 32, 32, 3), 1
ds = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
dl = torch.utils.data.DataLoader(ds, batch_size=4,
                                          shuffle=True, num_workers=2)
ds.data = np.r_[dl.dataset.data, newimg]
dl.targets.append(newlabel)

That seems more difficult. Also, it is sometimes the case that the sampler doesn’t pick up these changes. I think you could access that directly, but again, you should just rewrite the Dataset. You can start with the official code and work from there.

https://pytorch.org/docs/stable/_modules/torchvision/datasets/cifar.html#CIFAR10

p-enel · July 1, 2022, 7:01pm

I’m interested in doing exactly what the OP asked about.
In addition to the input and targets I would like the DataLoader to return metadata on the target that is used by a custom loss function.
Basically, I would like to have something like that:

for step, (data, target, target_metadata) in enumerate(new_trainloader):
    loss_weights = compute_loss_weights(target_metadata)
    ...
    loss = weighted_loss(model_output, target, loss_weights)

Anyway I can do that with a DataLoader? Or should I implement my own class?