How to extract target from a dataset made of dictionary

I am using a dictionary for my dataset, here is two of the items:

metaitem {'imname': 'voc_dataset\\VOCdevkit\\VOC2007\\JPEGImages\\000003.jpg', 'anno_id': 2, 'impath': 'voc_dataset\\VOCdevkit\\VOC2007\\JPEGImages\\000003.jpg', 'xml_parsed': {'annotation': {'folder': 'VOC2007', 'filename': '000003.jpg', 'source': {'database': 'The VOC2007 Database', 'annotation': 'PASCAL VOC2007', 'image': 'flickr', 'flickrid': '138563409'}, 'owner': {'flickrid': 'RandomEvent101', 'name': '?'}, 'size': {'width': '500', 'height': '375', 'depth': '3'}, 'segmented': '0', 'object': [{'name': 'sofa', 'pose': 'Unspecified', 'truncated': '0', 'difficult': '0', 'bndbox': {'xmin': '123', 'ymin': '155', 'xmax': '215', 'ymax': '195'}}, {'name': 'chair', 'pose': 'Left', 'truncated': '0', 'difficult': '0', 'bndbox': {'xmin': '239', 'ymin': '156', 'xmax': '307', 'ymax': '205'}}]}}}
(375, 500, 3)
metaitem {'imname': 'voc_dataset\\VOCdevkit\\VOC2007\\JPEGImages\\000002.jpg', 'anno_id': 1, 'impath': 'voc_dataset\\VOCdevkit\\VOC2007\\JPEGImages\\000002.jpg', 'xml_parsed': {'annotation': {'folder': 'VOC2007', 'filename': '000002.jpg', 'source': {'database': 'The VOC2007 Database', 'annotation': 'PASCAL VOC2007', 'image': 'flickr', 'flickrid': '329145082'}, 'owner': {'flickrid': 'hiromori2', 'name': 'Hiroyuki Mori'}, 'size': {'width': '335', 'height': '500', 'depth': '3'}, 'segmented': '0', 'object': [{'name': 'train', 'pose': 'Unspecified', 'truncated': '0', 'difficult': '0', 'bndbox': {'xmin': '139', 'ymin': '200', 'xmax': '207', 'ymax': '301'}}]}}}
(500, 335, 3)

I am looping through the dataset items like this:

for imname, metaitem in metadata_test.items():

but this approach doesn’t take the ‘name’ key or label which is what I am trying to predict for training/eval. I believe I would need to convert these names to indices too.

Any idea how to extract the label from this dictionary to be use in the for loop above? I am sorry to ask its my first time dealing with dictionaries.

Based on just what you have posted these are nested dictionaries but no actual images or labels, which is what you need for training.

From the name metadata it seems like you need to use these to get the actual images and labels for training.

This looks to be part of some larger code base. Can you share the rest of it so it’s easier to understand what exactly you are trying to do.