Voc detection dataloader error

My code is like below. And the error is below the code.

transform=transforms.Compose([
     transforms.Resize((416, 416), 2), # resize with (416,416) which is from the paper      # 2 means bilinear interpolation
     transforms.ToTensor(),
     transforms.Normalize((0, 0, 0), (255, 255, 255)) # as 1 / scalefactor in OpenCV
])

root = os.path.join("./yolo-9000", "VOCdevkit", "VOC2012", "JPEGImages")
data_train = torchvision.datasets.VOCDetection(root, year='2012', image_set='train', transform=transform,download=False) # need to add 2007 data

train_loader = torch.utils.data.DataLoader(data_train,
                                          batch_size=2,
                                          shuffle=True
                                          )

name_classes = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
                'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
                'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

def train_net():
    net = net_structure.Yolov2Voc() # define yolo net
    net.train() # train mode
    criterion = Yolov2Loss() # yolo loss setting
    epoch_size = 135
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer setting as SGD

    print('train start!')

    for epoch in range(epoch_size):
        running_loss = 0.0
        image, target = next(iter(train_loader))
        ground_truth = prepare_ground_truth(target)
        for i, (image, target) in enumerate(train_loader, 0):
            ground_truth = prepare_ground_truth(target) # prepare for the GT info   ===> [cx, cy, w, h, id]
            optimizer.zero_grad() # initialize the gradient of the optimizer
            output = net(image) # put image batch into the net

When I call dataloader from VocDetection, it makes error like below.

File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1758, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1752, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1147, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/LeeYoungJo/Documents/school/graduate/yolov2/train_mod.py", line 230, in <module>
    train_net() # If run the code in this module, then it will run the train_net()
  File "C:/Users/LeeYoungJo/Documents/school/graduate/yolov2/train_mod.py", line 40, in train_net
    image, target = next(iter(train_loader))
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 560, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 68, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 68, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 68, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 68, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
  File "C:\Users\LeeYoungJo\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 63, in <listcomp>
    return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
TypeError: string indices must be integers

I remember having a similar error.
I believe that making a DataLoader with the torchvision.dataset.VOCDetection does not work. This is because the labels are a dictionary and pytorch does not know how they should be placed into a tensor as a batch. You will see that the code probably runs fine if you use batch_size = 1. I agree however that the error message is somewhat misleading.

I suggest writing your own dataset class, which can still use torchvision.datasets.VOCDetection, where you use your prepare_ground_truth function to convert the labels into a tensor. Then use this dataset inside the DataLoader.
Hope this helps!