Hi all, I want to create a customized dataloader that will be do as follows:
For each epoch, select a position in a given list without replacement. So, if the list has length is 10, then after 10 epochs, all positions in the list are selected
import torch.utils.data as data
class My_Dataset(data.Dataset):
def __init__(self, data_list):
self.data_list = data_list
def __getitem__(self, index):
position = np.random.randint (0, len(self.data_list))
return self.data_list[position]
def __len__(self):
return len(self.data_list)
For example, the data_list=[1,4,5,7,8], if the first epoch selects position =4, then second epoch should select another position except 4 (because it selected in first epoch), and so on. We will permute the list when epoch bigger than data_list size
Sorry but you may misunderstand my question. I want to select a postion in the data_list in each epoch, such that the postion did not repeat in next epoch
Ok, one more question! Is the size of data list the same a number of samples? It has to be that way, because you want to have sampling without replacement. Is that right?
No. the length of data list often bigger than data sample size. It may be 1000, while data samepler size is 100. It store position of ROI in the image. Base on the ROI, I can crop the image into smaller image
Then, in the __getitem__ function you can take a value from this array. But that will only take the first 100 elements and will ignore the rest, since the input index is always between 0 < index < 100. So, to fix this, we can change the function __len__ to return 1000 instead of 100, and then in the __getitem__ we do the fllowing:
def __getitem__(index):
if index > 100:
indx_data = index % 100
else:
indx_data = index
position = self.position_arry[index]
data = .. # use indx_data to retrieve the correct sample
So, we use the index to retrieve the position, and the indx_data to retrive the sample. Also, note that the actual number of epochs is changed as well. One epoch like this corresponds to 10 epochs before.