Feeding a network with images of one label only in each batch


I have a data in which all the images in each folder are belonging to one label (the label is a number).
I’ll explain with an example, under folder X I have let’s say 500 images, and their label (of the whole 500 images) is let’s say 250 (the label corresponds to time elapsed).
Under folder Y I have let’s say 600 images, and their label (of the whole 600 images) is let’s say 320.

The assumption that let’s say 50 images of Y also have a label of 320, or close do it, is valid. But the assumption that each image has a label of 320 is totally wrong.
It means that each pass of train or validation data should include 50 images (otherwise the loss calculation would be incorrect). (40 could be ok also, for instance)

My goal is to get images (let’s say of Z) and predict their label.

For the training phase:

  1. I created a custom dataset (pretty much straight forward - path and label that the “image” belongs to).
  2. I modified the network so if it gets N images (assuming the N images are belonging to one label) in a batch, and during the forward pass it calculates the average embeddings, and afterwards as usual - flattening, FC, one output). Shared wheights.

Problem is, how to create a dataloder that supports feeding the network with images of one label only in each batch?
For example - I want that the first and second batches would include 50 images (each) of label 100, third and forth batches include 50 images (each) of label 125, and so on. (more than just 100 images of each label, of course)
For the validation, I want let’s say the first batch to have 50 images of label 100, second batch - 50 images of label 125, and so on.
A train-val split is easy, but I need more than that.

My custom dataset can find the indexes of each label, but I don’t know if there’s a way to make the dataloader choose them somehow. As far as I know the dataloader’ sampler cannot deal with that.
Any idea how to do that?

My custom dataset is shown below

class CustomDatasetFromCSV(Dataset):
def init(self, csv_path, height, width, transform=None):
csv_path (string): path to csv file
height (int): image height
width (int): image width
transform: pytorch transforms for transforms and tensor conversion
self.data = pd.read_csv(csv_path)
self.height = height
self.width = width
self.transform = transform

def __getitem__(self, index):
    single_image_label = self.labels[index]

    path = self.data.iloc[index, 0]
    img = Image.open(self.data.iloc[index, 0]).convert("RGB")
    label = torch.tensor(float(self.data.iloc[index, 2]))
    if self.transform:
        img = self.transform(img)
    return path, img, label

def __len__(self):
    return len(self.data.index)

def get_all_label_names(self):
    # returns all labels
    return np.unique(self.data['label'].values)

def get_labels_indexes_dict(self):
    # returns a dictionary with all labels and their indexes
    uniques = np.unique(list(self.data['label'].values))
    labels_dict = {} 
    for unique in uniques:
        labels_dict[unique] = np.array(self.data[self.data['label'] == unique].index)
    return labels_dict

A custom sampler sounds like the right idea.
Once you have the labels, you could create a custom sampler (e.g. as a BatchSampler) and let it pass the indices for a single target only. Here is an example of the usage of a BatchSampler and for your use case you could derive from this class to implement a custom sample logic.

1 Like

I see that the BatchSampler wraps another sampler to yield mini-batches.
My training includes going over all the possible targets, not one only. Currently I don’t fully understand how to do that with one dataloader only, rather than creating new ones when I switch to a new label.

Assuming I have a custom dataset - “train_dataset”, and I know the indexes of all the targets, such as:
indx_dict = {‘label1’: [20,21,22,23,24,25],
‘label2’ : [25,26,27,28,29, 30]}

How can I create one dataloader that samples twice for example from “label1” indexes (bs=3), then samples twice from “label2” indexes (bs=3)? Preferably do that with replacement.
Can you help with a code snippet?