A runtime error occurred in creating and using custom dataset

minhoha · February 27, 2018, 10:57am

I created a custom dataset with reference to github.
(GitHub - utkuozbulak/pytorch-custom-dataset-examples: Some custom dataset examples for PyTorch)
The dataset is a variant of cifar100 and consists of .png images and .csv file with the picture directory and label.

The dataset code is as follows.

from torch.utils.data.dataset import Dataset
from torchvision import transforms
import pandas as pd
import numpy as np
from PIL import Image

class CIFAR100DIRTY_TEST(Dataset):

def __init__(self, csv_path):
    self.transformations = transforms.Compose([transforms.CenterCrop(32),
                                               transforms.ToTensor(),
                                               transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])
    self.data_info = pd.read_csv(csv_path, header=None)
    self.image_arr = np.asarray(self.data_info.iloc[:, 0])
    self.label_arr = np.asarray(self.data_info.iloc[:, 1])
    self.data_len = len(self.data_info.index)

def __getitem__(self, index):  # returns the data and labels. This function is called from dataloader like this
    single_image_name = self.image_arr[index]
    img_as_img = Image.open(single_image_name)
    img_as_tensor = self.transformations(img_as_img)
    single_image_label = self.label_arr[index]

    return (img_as_tensor, single_image_label)

def __len__(self):
    return self.data_len

And in the main file I called:

cifar_train_dirty = cifar_dirty_train.CIFAR100DIRTY_TRAIN(“/home/mhha/A2S/cifar100_train_targets.csv”)
cifar_test_dirty = cifar_dirty_test.CIFAR100DIRTY_TEST(“/home/mhha/A2S/cifar100_test_targets.csv”)

train_loader = torch.utils.data.DataLoader(cifar_train_dirty,batch_size=args.bs, shuffle=True,num_workers=2,drop_last=False)
test_loader = torch.utils.data.DataLoader(cifar_test_dirty,batch_size=10000, shuffle=False,num_workers=2,drop_last=False)

Traceback (most recent call last):
  File "main_20180227_thin_v1.py", line 144, in <module>
    train(epoch)
  File "main_20180227_thin_v1.py", line 88, in train
    for batch_idx, (inputs, targets) in enumerate(train_loader):
  File "/home/mhha/.conda/envs/pytorchmh2/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 200, in __next__
    return self._process_next_batch(batch)
  File "/home/mhha/.conda/envs/pytorchmh2/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 220, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/mhha/.conda/envs/pytorchmh2/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/mhha/.conda/envs/pytorchmh2/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 109, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/mhha/.conda/envs/pytorchmh2/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 109, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/mhha/.conda/envs/pytorchmh2/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 100, in default_collate
    return torch.LongTensor(batch)
RuntimeError: tried to construct a tensor from a int sequence, but found an item of type numpy.int64 at index (1)

Which part is wrong? How is it better to modify it?

ptrblck · February 27, 2018, 3:42pm

It seems single_image_label is a numpy array. Try to return the labels as Tensors:

single_image_label = torch.from_numpy(self.label_arr[index])

minhoha · February 28, 2018, 6:26am

Unfortunately, it doesn’t work…

jpeg729 · February 28, 2018, 7:22am

Does it still produce the same error?

minhoha · February 28, 2018, 7:24am

I fixed it.

single_image_label = np.int(self.label_arr[index])