Can I use weighted sample based dataloader for performing regression tasks

sparshgarg23 · March 23, 2022, 1:01pm

I have used weighted sample dataloader for performing classification task where the objective of the model is to determine which class does the image belong.

I also have another model,which given an image predicts it’s age weight and body tone.The dataloader for this model uses random sampling.Compared to the previous model,its performance isn’t that great,so I was wondering can I use weighted sample based dataloder to perform a regression task,and if yes then what changes would I have to make in my following code shown below.

Classification task
Weighted dataloader sampling for my dataset
contains apprx 5000 images belonging to  8 classes

def obtain_class_weights(img_dataset):
  target_list=torch.tensor(img_dataset.targets)
  class_count=[i for i in get_class_distribution(img_dataset).values()]
  class_weights=1./torch.tensor(class_count,dtype=torch.float)
  print(class_weights)
  class_weights_all=class_weights[target_list]
  return class_weights_all
train_class_weights=obtain_class_weights(train_dataset)
train_weighted_sampler = WeightedRandomSampler(
    weights=train_class_weights,
    num_samples=len(train_class_weights),
    replacement=True
)
trainloader=DataLoader(train_dataset,batch_size=16,shuffle=False,sampler=train_weighted_sampler,drop_last=True)
Each batch consists of image and its label which belongs to one of the eight classes

Data loader used for regression purpose.

train_indices,val_indices=indices[split:],indices[:split]
train_sampler=SubsetRandomSampler(train_indices)
train_loader=DataLoader(dataset,batch_size=16,sampler=train_sampler,num_workers=1)
here each batch consists of image and a label of dimension 3x1

suraj.pt · March 23, 2022, 4:56pm

Hi @sparshgarg23 that’s an interesting question. In classification, weighted sampling helps normalize overrepresentation of discrete classes; not sure how that strategy works for regression where the targets are continuous.

Maybe you can try using the inverse of the target density instead of counts?

sparshgarg23 · March 23, 2022, 5:57pm

thanks any tips on how to obtain the target density?
I found this article about transformed regressors in sklearn,but I am not sure on how to integrate it with pytorch.Any tips or suggestions would be welcome.

suraj.pt · March 24, 2022, 3:16pm

I am not an expert, but you could try KernelDensityEstimator in sklearn. AFAIK you can fit it to your targets, get densities for your target domain and use their inverse to weight your samples. All this logic can be added in your obtain_class_weights(). Hope this helps!