I am working on an image labeling pipeline in pytorch. I have it mostly working except in the getitem function. The function is supposed to load my images, and captions to match each image. Instead it loads images and captions randomly. The captions come from a dataframe / csv file. In the file there is an image_id and its matching caption. To match the caption one has to match the image id to the file name image_id. To load all the images one has to iterate through folders (as shown in the code below). Every time an image is opened it should have a matching caption.
I have tried iterating through the csv at the same time as the image files, but that doesn’t work. I know this is probably a simple answer, and would appreciate the input.
### to read all images in all directories, DEBUG CODE
class CaptioningData(Dataset):
def __init__(self, root, df, vocab):
self.df = df.reset_index(drop=True)
self.root = root
self.vocab = vocab
self.transform = transforms.Compose([
transforms.Resize(224),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406),
(0.229, 0.224, 0.225))]
)
def __getitem__(self, index):
"""Returns one data pair (image and caption)."""
#row = self.df.iloc[index].squeeze()
#id = row.image_id
#image_path = f'{self.root}/{id}.png' # original code, for debug
for file in files:
row = self.df.iloc[index].squeeze()
id = row.image_id
#print(os.path.join(root, file))
image = os.path.join(root, file)
image = Image.open(image).convert('RGB')
caption = row.InChI # here need caption to match image_id
tokens = str(caption).lower().split()
target = []
target.append(vocab.stoi['<start>'])
target.extend([vocab.stoi[token] for token in tokens])
target.append(vocab.stoi['<end>'])
target = torch.Tensor(target).long()
return image, target, caption
#debug
def choose(self):
return self[np.random.randint(len(self))]
#code
#def choose(self):
# return self[(len(self))]
def __len__(self):
return len(self.df)
def collate_fn(self, data):
data.sort(key=lambda x: len(x[1]), reverse=True)
images, targets, captions = zip(*data)
images = torch.stack([self.transform(image) for image in images], 0)
lengths = [len(tar) for tar in targets]
_targets = torch.zeros(len(captions), max(lengths)).long()
for i, tar in enumerate(targets):
end = lengths[i]
_targets[i, :end] = tar[:end]
return images.to(device), _targets.to(device), torch.tensor(lengths).long().to(device)
I’m not sure how the matching is done here. It looks like with the current setup the target will be what is specified by index but image will simply be the last image assigned (which in this case is the last filename in files). I think there needs to be some correspondence used between index and the file. If you know the filename from the target image_id, can you just use that to open the correct image rather than iterating through all of them? It might be easier to understand if we see an example of the filename file as well as all of the data contained in a single row of the dataframe.
Finally, I don’t even see where files is defined; is it a global variable somewhere? I would guess that unless it is being mutated somewhere the exact same image would be returned every time with the current code.
Thanks for the response. All the files are in subdirectories like this:
root/first/second/third/image.png
So I wrote the loop:
Which loops through all directories and opens the images. It was also supposed to assign a caption to each based on the dataframe’s ‘image_id’. The problem is exactly as you described. There’s an index for each image, but the loop just opens the images randomly until it loops through all files. It doesn’t match on the index as I would like. The index has the appropriate caption.
OK, in that case it might be more efficient if you only loop through all the images once and build a mapping from id to the precise filepath for each id, then just look this up when you want to do __getitem__.
For example, you can add something like
self.id_to_path = dict()
for dirpath, _, filenames in os.walk(root):
for filename in filenames:
id, ext = os.path.basename(filename).splitext()
if ext == '.png':
# TODO: might want to convert to integer depending on id type in dataframe
self.id_to_path[id] = os.path.join(dirpath, filename)
Then in your __getitem__ just lookup the proper file:
def __getitem__(self, index):
"""Returns one data pair (image and caption)."""
row = self.df.iloc[index].squeeze()
id = row.image_id
image = self.id_to_path[id]
image = Image.open(image).convert('RGB')
caption = row.InChI # here need caption to match image_id
tokens = str(caption).lower().split()
target = []
target.append(vocab.stoi['<start>'])
target.extend([vocab.stoi[token] for token in tokens])
target.append(vocab.stoi['<end>'])
target = torch.Tensor(target).long()
return image, target, caption