Creating Image Labeling Pipeline, Labels are not Matching Up to Correct Image, instead they load randomly

Hi there,

I am working on an image labeling pipeline in pytorch. I have it mostly working except in the getitem function. The function is supposed to load my images, and captions to match each image. Instead it loads images and captions randomly. The captions come from a dataframe / csv file. In the file there is an image_id and its matching caption. To match the caption one has to match the image id to the file name image_id. To load all the images one has to iterate through folders (as shown in the code below). Every time an image is opened it should have a matching caption.

I have tried iterating through the csv at the same time as the image files, but that doesn’t work. I know this is probably a simple answer, and would appreciate the input.

### to read all images in all directories, DEBUG CODE

class CaptioningData(Dataset):
    def __init__(self, root, df, vocab):
        self.df = df.reset_index(drop=True)
        self.root = root
        self.vocab = vocab
        self.transform = transforms.Compose([ 
            transforms.Resize(224),
            transforms.RandomCrop(224),
            transforms.RandomHorizontalFlip(), 
            transforms.ToTensor(), 
            transforms.Normalize((0.485, 0.456, 0.406), 
                                 (0.229, 0.224, 0.225))]
        )
    def __getitem__(self, index):
        """Returns one data pair (image and caption)."""
        #row = self.df.iloc[index].squeeze()
        #id = row.image_id
        #image_path = f'{self.root}/{id}.png' # original code, for debug
        for file in files:
            row = self.df.iloc[index].squeeze()
            id = row.image_id
            #print(os.path.join(root, file))
            image = os.path.join(root, file)
            image = Image.open(image).convert('RGB')    
            caption = row.InChI # here need caption to match image_id
            tokens = str(caption).lower().split()
            target = []
            target.append(vocab.stoi['<start>'])
            target.extend([vocab.stoi[token] for token in tokens])
            target.append(vocab.stoi['<end>'])
            target = torch.Tensor(target).long()
        return image, target, caption
    
    #debug
    def choose(self):
        return self[np.random.randint(len(self))]
    
    #code
    #def choose(self):
    #    return self[(len(self))]
    
    def __len__(self):
        return len(self.df)
    def collate_fn(self, data):
        data.sort(key=lambda x: len(x[1]), reverse=True)
        images, targets, captions = zip(*data)
        images = torch.stack([self.transform(image) for image in images], 0)
        lengths = [len(tar) for tar in targets]
        _targets = torch.zeros(len(captions), max(lengths)).long()
        for i, tar in enumerate(targets):
            end = lengths[i]
            _targets[i, :end] = tar[:end] 
        return images.to(device), _targets.to(device), torch.tensor(lengths).long().to(device)

I’m not sure how the matching is done here. It looks like with the current setup the target will be what is specified by index but image will simply be the last image assigned (which in this case is the last filename in files). I think there needs to be some correspondence used between index and the file. If you know the filename from the target image_id, can you just use that to open the correct image rather than iterating through all of them? It might be easier to understand if we see an example of the filename file as well as all of the data contained in a single row of the dataframe.

Finally, I don’t even see where files is defined; is it a global variable somewhere? I would guess that unless it is being mutated somewhere the exact same image would be returned every time with the current code.

Hi Eqy,

Thanks for the response. All the files are in subdirectories like this:
root/first/second/third/image.png
So I wrote the loop:

Which loops through all directories and opens the images. It was also supposed to assign a caption to each based on the dataframe’s ‘image_id’. The problem is exactly as you described. There’s an index for each image, but the loop just opens the images randomly until it loops through all files. It doesn’t match on the index as I would like. The index has the appropriate caption.

Hope this makes sense.

The above line works fine if there are no subdirectories, but of course it doesn’t loop through all the directories.

OK, in that case it might be more efficient if you only loop through all the images once and build a mapping from id to the precise filepath for each id, then just look this up when you want to do __getitem__.

For example, you can add something like

self.id_to_path = dict()
for dirpath, _, filenames in os.walk(root):
    for filename in filenames:
        id, ext = os.path.basename(filename).splitext()
        if ext == '.png':
            # TODO: might want to convert to integer depending on id type in dataframe
            self.id_to_path[id] = os.path.join(dirpath, filename)

Then in your __getitem__ just lookup the proper file:

    def __getitem__(self, index):
        """Returns one data pair (image and caption)."""
        row = self.df.iloc[index].squeeze()
        id = row.image_id
        image = self.id_to_path[id]
        image = Image.open(image).convert('RGB')    
        caption = row.InChI # here need caption to match image_id
        tokens = str(caption).lower().split()
        target = []
        target.append(vocab.stoi['<start>'])
        target.extend([vocab.stoi[token] for token in tokens])
        target.append(vocab.stoi['<end>'])
        target = torch.Tensor(target).long()
        return image, target, caption

Thank you so much! I will try this right away.

It works perfectly. I made one small change:

class CaptioningData(Dataset):
    def __init__(self, root, df, vocab):
        self.df = df.reset_index(drop=True)
        self.root = root
        self.vocab = vocab
        self.id_to_path = dict()
        for dirpath, _, filenames in os.walk(self.root):
            for filename in filenames:
                id, ext = os.path.splitext(filename)
                if ext == '.png':
                    self.id_to_path[id] = os.path.join(dirpath, filename)
        self.transform = transforms.Compose([ 
            transforms.Resize(224),
            transforms.RandomCrop(224),
            transforms.RandomHorizontalFlip(), 
            transforms.ToTensor(), 
            transforms.Normalize((0.485, 0.456, 0.406), 
                                 (0.229, 0.224, 0.225))]
        )

Thank you again!

Nice, good change with removing basename.