"index 1 is out of bounds for dimension 0 with size 1" error in pytorch custom dataset generator

Ashwath · October 28, 2021, 4:39am

i’m trying to iterate through a dataloader created from a pytorch dataset. when I run the loop it gives me "index 1 is out of bounds for dimension 0 with size 1 error ". below is my dataset class
class traindataset(Dataset):

class traindataset(Dataset):
    def __init__(self,data,train_end_idx,augmentation=None):
        '''
        data: data is a pandas dataframe generated from csv file where it has columns-> [name,labels,col 1,col2,...,col784]. shape of data->(10000, 786)
        
        
        '''
        self.data=data
        self.augmentation=augmentation
        self.train_end=train_end_idx
        self.target=self.data.iloc[:self.train_end,1].values
        
    def __len__(self):
        return len(self.target);
    def __getitem__(self,idx):
        target=self.target
        image=self.data.iloc[:self.train_end,2:].values
        if self.augmentation is not None:
            image = self.augmentation(image)
        return torch.tensor(target[idx]),image[idx]

below is my augmenataion and dataloader generator

torchvision_transform = transforms.Compose([
    np.uint8,
    transforms.ToPILImage(),
    transforms.Resize((28,28)),
    transforms.RandomRotation([45,135]),
    transforms.ToTensor()
    ])

below is the loop I’m running, where I’m getting the mentioned error

for label,image in trainloader:
    print(label,train)

below is the complete error I have received.

IndexError                                Traceback (most recent call last)
/tmp/ipykernel_41/1540740000.py in <module>
----> 1 for label,image in trainloader:
      2     print(label,train)

/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    559     def _next_data(self):
    560         index = self._next_index()  # may raise StopIteration
--> 561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:
    563             data = _utils.pin_memory.pin_memory(data)

/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/tmp/ipykernel_41/1814544681.py in __getitem__(self, idx)
     18         if self.augmentation is not None:
     19             image = self.augmentation(image)
---> 20         return torch.tensor(target[idx]),image[idx]

IndexError: index 1 is out of bounds for dimension 0 with size 1

note: code works fine without augmentation.

ptrblck · October 28, 2021, 4:52am

Check the shape of target as it seems to have a size of 1 in dim0, while you are trying to index it with idx = 1.

Ashwath · October 28, 2021, 5:18am

since i’m taking the .values of a dataframe its using the numpy array. and in def __getitem__(self,idx) if i return only return torch.tensor(target[idx]) it works fine. and also i have printed the shape of target it’s tensor([2000])

ptrblck · October 28, 2021, 5:25am

In that case I might have misinterpreted the error message and image is the culprit, which seems to have the aforementioned shape.

Ashwath · October 28, 2021, 7:47am

yes the problem is in the image after augmentation.
image shape before augmentation-> (2000, 784)->2000 rows of data passing from csv.
image shape after augmentation-> ([1, 28, 28])–>(28,28) is because in augmentation i’m using resize(28,28)
image shape after augmentation is what, get_item() function is returning. i don’t know why augmentation is returning [1,28,28] since my batch size is 2, i think it should be of shape[2,28,28]. i’m very new to pytorch can you please help me fix this?

ptrblck · October 28, 2021, 8:02am

The Dataset.__getitem__ method will get a single index by default and will thus load and process a single sample.
The DataLoader will pass batch_size indices one-by-one to the __getitem__ method and create the batch afterwards using its collate_fn.
So inside __getitem__ treat idx as a single scalar which should load a single sample and target.

Ashwath · October 28, 2021, 9:18am

@ptrblck thank you for patiently and quickly replying, I have made the below changes in my code but it still has the same issue.

class traindataset(Dataset):
    def __init__(self,data,train_end_idx,augmentation = None):
        '''
        data: pandas dataframe
        
        '''
        self.data=data
        self.augmentation=augmentation
        self.train_end=train_end_idx
        self.target=self.data.iloc[:self.train_end,1].values
        self.image=self.data.iloc[:self.train_end,2:].values #contains full data
        
    def __len__(self):
        return len(self.target);
    def __getitem__(self,idx):
        self.target=self.target
        image=self.image[idx].reshape(1,784) #only takes the selected index
        if self.augmentation is not None:
            image = self.augmentation(image)
        
        return torch.tensor(self.target[idx]),torch.tensor(image)

changes I have done are:

in init I put my whole dataset rather than in getitem()->image=self.image[idx].reshape(1,784)
in get item I’m passing only one image by specifying its index
->image=self.image[idx].reshape(1,784)

my3bikaht · October 28, 2021, 9:59am

get_item is dealing with just one image[idx]

It is better to move augmentation to init and transform all images altogether, not 1by1 in get_item_, also check that your augmentation doesn’t change shape to (1, 28, 28) where 1 is the number of channels.

ptrblck · October 28, 2021, 10:02am

Check the shape of self.image in the __getitem__ and make sure you can properly index it in dim0.

@my3bikaht NIT: if your data augmentation applies random transformations, applying it in the __init__ would transform each sample only once and no random augmentation would be used.