Problem
I would like to modify the dataset values while in learning. The code below is what I want: modifying the dataset value label in the optimization loop, though the dataset is not MNIST but my research-specific dataset. Of course, the following code won’t work as changing labels
will not change datasets
value iteself.
If anybody in the community try the similar thing, it would be really nice how to do this.
import torch
from torchvision import datasets
from torchvision import transforms
torch.manual_seed(0)
mnist_dataset = datasets.MNIST("/tmp", download=True, transform=transforms.ToTensor()) # whatever
mnist_small_dataset, _ = torch.utils.data.random_split(mnist_dataset, [4, len(mnist_dataset)-4])
loader = torch.utils.data.DataLoader(mnist_small_dataset, batch_size=2, shuffle=False)
for samples in loader:
_, labels = samples
# compute loss and backward() and optimize...
labels[0] = 100000 # whatever
A naive workaround that actually does not work
Also, a workaround arose to me is like defining a dataset class wrapping the original dataset and return dataset index at the same time as below:
class WorkaroundDataset(torch.utils.data.Dataset):
def __init__(self, dataset):
self._dataset = dataset
def __len__(self):
return len(self._dataset)
def __getitem__(self, idx):
return (*self._dataset[idx], idx)
Then, I tried to modify the dataset value by indexing, but it’s not possible as each element in sample is returned by tuple. I could directly modify the source code of pytorch so that it returns list, but it seems dirty and I don’t want to do that.
mnist_dataset = WorkaroundDataset(mnist_dataset)
mnist_small_dataset, _ = torch.utils.data.random_split(mnist_dataset, [4, len(mnist_dataset)-4])
loader_workaround = torch.utils.data.DataLoader(mnist_small_dataset, batch_size=2, shuffle=False)
for samples in loader_workaround:
_, labels, idxes = samples
# compute loss and backward() and optimize...
for idx in idxes:
mnist_dataset._dataset[idx][1] = 100000 # whatever