New batch in each epoch

Hello. I am trying to use augmented and not augmented dataset in each epoch(for example: augmented in one epoch not augmented in different epoch) but i couldn’t figure out how to do it. My approach was loading DataLoader in each epoch again and again but I think it’s wrong. Because when i print indexes in getitem in my Dataset, there are a lot of duplicate indexes.
Here is my code for training: CodePile | Easily Share Piles of Code
Here is my code for the dataset: CodePile | Easily Share Piles of Code
How can i achieve what i want?
Thanks in advance.

hi, you’re not clear about what you want. at least i don’t get it.
also re-creating dataloader can not cause duplicate index.

import torch
class Dataset1(torch.utils.data.Dataset):


    def __init__(self,lenght):
        self.idx = torch.arange(lenght)**2

  

    def __len__(self):

        return len(self.idx)

    def __getitem__(self, i):
        return i , self.idx[i]
    
    
train_sampler = torch.utils.data.SubsetRandomSampler(torch.arange(100))
Training_Data = Dataset1(100)

train_loader = torch.utils.data.DataLoader(Training_Data, batch_size=4, sampler=train_sampler,
                                           num_workers=2, pin_memory=False)


idx = []
for x,y in train_loader:
    print(x,y)
    idx.append(x)
all_index , _= torch.sort(torch.cat(idx))
print(all_index)

output :
tensor([91, 55, 82, 31]) tensor([8281, 3025, 6724, 961])
tensor([71, 23, 29, 25]) tensor([5041, 529, 841, 625])
tensor([70, 10, 26, 93]) tensor([4900, 100, 676, 8649])
tensor([75, 96, 36, 85]) tensor([5625, 9216, 1296, 7225])
tensor([48, 86, 38, 22]) tensor([2304, 7396, 1444, 484])
tensor([44, 46, 98, 3]) tensor([1936, 2116, 9604, 9])
tensor([12, 33, 27, 99]) tensor([ 144, 1089, 729, 9801])
tensor([43, 65, 16, 2]) tensor([1849, 4225, 256, 4])
tensor([63, 97, 51, 47]) tensor([3969, 9409, 2601, 2209])
tensor([69, 45, 76, 56]) tensor([4761, 2025, 5776, 3136])
tensor([32, 11, 68, 64]) tensor([1024, 121, 4624, 4096])
tensor([52, 39, 18, 5]) tensor([2704, 1521, 324, 25])
tensor([ 0, 79, 92, 35]) tensor([ 0, 6241, 8464, 1225])
tensor([77, 54, 21, 60]) tensor([5929, 2916, 441, 3600])
tensor([81, 1, 57, 58]) tensor([6561, 1, 3249, 3364])
tensor([ 7, 13, 84, 94]) tensor([ 49, 169, 7056, 8836])
tensor([67, 14, 80, 89]) tensor([4489, 196, 6400, 7921])
tensor([83, 20, 53, 37]) tensor([6889, 400, 2809, 1369])
tensor([62, 66, 15, 78]) tensor([3844, 4356, 225, 6084])
tensor([59, 90, 17, 42]) tensor([3481, 8100, 289, 1764])
tensor([72, 41, 95, 50]) tensor([5184, 1681, 9025, 2500])
tensor([73, 28, 74, 49]) tensor([5329, 784, 5476, 2401])
tensor([19, 34, 61, 30]) tensor([ 361, 1156, 3721, 900])
tensor([ 8, 87, 6, 9]) tensor([ 64, 7569, 36, 81])
tensor([40, 88, 24, 4]) tensor([1600, 7744, 576, 16])
tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Thank you so much for your answer. I just wanna understand how sampling works. In one iteration i want my program to select batch that has augmented images and in another iteration i want my program to select batch that doesnt have augmented image.

Also i researched and i couldnt understand anything about sampling. I mean what does random sampler is doing?? What is the purpose of that? If you answer this, it would be amazing too

in your code , it’s for creating val and train set .
this is one way of spliting your dataset to train and val set.
if you set sampler to None, dataloader choose samples from all semple in your dataset.

transform doesn’t work this way by default . in most case dataset getitem applis transform to single sample.
if you want , you can return both transfomed and orginal image in getitem and use one that you want.

def __getitem__(self, i):
        img , t = ...#load image and label
        return img ,self.transform(img),t

1 Like

Thank you so much. So sampler is getting data that i want with random indexes or with sequential order.

torch.utils.data.SubsetRandomSampler by default is random with equal probability for each sample and without replacing.

1 Like

What do you mean by “random with equal probability for each sample”?
I thought if we have dataset like this:x,y,x,x,y,y,x,x,y,x,x,y,x,x,x,x,y,y,y,x,x,y,y,y,x,y
We are sampling it randomly for each batch and in for loop each batch will look like this : x,x,x,y (if batch size is 4) (indexes are randomly chosen)

yes .
but you can use
torch.utils.data.WeightedRandomSampler
https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler
to set different weight for each sample.

1 Like

man. You helped this guy a lot. Thank you so much.

1 Like