I am working in a regression model, the data distribution is quite a multimodal type. So I wanted to make a sampler that can for target values greater than, say x(in code below, x =8) , is found at least 1% of all the dataset. Basically oversampling.
But because of this my models are greatly overfitting.
class My_sampler(Sampler): def __init__(self ,dataset , pct = 0.1): self.df = dataset.df.Target self.pct = pct def __len__(self): return len(self.df) def __iter__(self): greater_idx = np.where(self.df > 8) rest_idx = np.where(self.df <= 8) greater = np.random.choice(greater_idx , int(self.pct*len(self.df)) ) rest = np.random.choice(rest_idx , int((1-self.pct)*len(self.df))+1 , replace = False) idxs = np.hstack([greater ,rest ]) np.random.shuffle(idxs) idxs = idxs[:len(self.df)] return iter(idxs) our_sampler = My_sampler(dataset) loader = DataLoader(dataset , sampler=our_sampler , batch_size =8 , drop_last = True)
I don’t know why it’s overfitting, maybe I am not making the Sampler class properly.
Also is it possible that if I declare my sampler class like this, the same example is being used every batch, and in an epoch, every image is not being used as input?