Hi all, I want to create a customized dataloader that will be do as follows:
For each epoch, select a position in a given list without replacement. So, if the list has length is 10, then after 10 epochs, all positions in the list are selected
import torch.utils.data as data
def __init__(self, data_list):
self.data_list = data_list
def __getitem__(self, index):
position = np.random.randint (0, len(self.data_list))
For example, the
data_list=[1,4,5,7,8], if the first epoch selects
position =4, then second epoch should select another position except
4 (because it selected in first epoch), and so on. We will permute the list when epoch bigger than
I suppose you want to shuffle your input data. But you should not do that in the
__getitem__. There are two wats to do that:
self.data_list in the
- Use the DataLoader and set parameter
shuffle=True; That will take care of the shuffling part.
from torch.utils.data import DataLoader
dataset = MyDataset(...)
data_loader = DataLoader(dataset,
Thanks. But how to make the selection position without replacements. Your code is simple, I can do it
Both methods I have suggested will result in random selection without-replacement.
Sorry but you may misunderstand my question. I want to select a postion in the data_list in each epoch, such that the postion did not repeat in next epoch
One epoch goes through the whole data samples, right? This position that you are talking about, is it the same as the index of samples, or different?
Yes, Each epoch goes through whole data sample but the data list is not the number of sample. Data list is likes an arbitrary array
Ok, one more question! Is the size of data list the same a number of samples? It has to be that way, because you want to have sampling without replacement. Is that right?
No. the length of data list often bigger than data sample size. It may be 1000, while data samepler size is 100. It store position of ROI in the image. Base on the ROI, I can crop the image into smaller image
I see. I was thinking if it is smaller, but if the size is larger, then it works.
So, what you can do, in the
__init__ function, create a random array of these positions like below:
self.position_arry = np.random.choice(1000, 1000, replace=False)
Then, in the
__getitem__ function you can take a value from this array. But that will only take the first
100 elements and will ignore the rest, since the input
index is always between
0 < index < 100. So, to fix this, we can change the function
__len__ to return 1000 instead of 100, and then in the
__getitem__ we do the fllowing:
if index > 100:
indx_data = index % 100
indx_data = index
position = self.position_arry[index]
data = .. # use indx_data to retrieve the correct sample
So, we use the
index to retrieve the position, and the
indx_data to retrive the sample. Also, note that the actual number of epochs is changed as well. One epoch like this corresponds to 10 epochs before.