Hi, I contruct my own dataset according to the data loading tutorial and using the standard Dataloader provide by PyTorch. The code is like this,
train_set = MyDataset(some_parameter...)
train_loader = Dataloader(dataset=train_set, other_setting...)
for batch_idx , data, target in enumerate(train_loader):
#training processing
After several training epoch, I would like to regenerate my training dataset, I have written a custom member funtion regenerate_sample() for MyDataset. So I can just use train_set.regenerate_sample() to change the samples in train_set.
But what I am not sure about is will this change be reflected on train_loader, i.e., will train_loader now generate batch of samples from the new dataset samples instead of the old dataset? Or do I have to manually construct a new Dataloader object in order to use the newly updated dataset? like the following,
The DataLoader seems to use a reference to the Dataset object, so that your regenerate should work.
Here is a small sample code, which works fine for me:
class MyDataset(Dataset):
def __init__(self, data):
self.data = data
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return len(self.data)
def regenerate(self):
self.data = torch.ones(3, 1).float()
data = torch.zeros(3, 1).float()
dataset = MyDataset(data)
print dataset[0] # should output 0
data_loader = DataLoader(dataset)
for d in data_loader:
print d
dataset.regenerate()
print dataset[0] # should output 1
for d in data_loader:
print d
In case this helps anyone, I can confirm the switch successfully occurs for workers > 0. This means that any modifications done to the Dataset object will traverse to the workers as well.