VOCDetection __getitem__() problem

import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import torchvision
import torchvision.transforms as transforms

data2=torchvision.datasets.VOCDetection("./",download=True,transform=transforms.ToTensor(),target_transform=transforms.ToTensor())
img,tar =data2[0]
I get the following error:
----> 1 img,tar=data1[0]

/usr/local/lib/python3.6/dist-packages/torchvision/transforms/functional.py in to_tensor(pic)
48 β€œβ€"
49 if not(_is_pil_image(pic) or _is_numpy_image(pic)):
β€”> 50 raise TypeError(β€˜pic should be PIL Image or ndarray. Got {}’.format(type(pic)))
51
52 if isinstance(pic, np.ndarray):

TypeError: pic should be PIL Image or ndarray. Got <class β€˜dict’>

The VOCDetection target is a dictionary of the XML tree as stated in the docs.
Since ToTensor works on PIL.Images, you cannot use it as a target_transform.

What should I do in order to get (image,target) pair?

You would have to remove the target_transform and use the XML tree for your detection task.
Have a look at this tutorial.
While another dataset is used, it might be a good starter for your VOCDetection.

1 Like