"index 1 is out of bounds for dimension 0 with size 1" error in pytorch custom dataset generator

i’m trying to iterate through a dataloader created from a pytorch dataset. when I run the loop it gives me "index 1 is out of bounds for dimension 0 with size 1 error ". below is my dataset class
class traindataset(Dataset):

class traindataset(Dataset):
    def __init__(self,data,train_end_idx,augmentation=None):
        '''
        data: data is a pandas dataframe generated from csv file where it has columns-> [name,labels,col 1,col2,...,col784]. shape of data->(10000, 786)
        
        
        '''
        self.data=data
        self.augmentation=augmentation
        self.train_end=train_end_idx
        self.target=self.data.iloc[:self.train_end,1].values
        
    def __len__(self):
        return len(self.target);
    def __getitem__(self,idx):
        target=self.target
        image=self.data.iloc[:self.train_end,2:].values
        if self.augmentation is not None:
            image = self.augmentation(image)
        return torch.tensor(target[idx]),image[idx]

below is my augmenataion and dataloader generator

torchvision_transform = transforms.Compose([
    np.uint8,
    transforms.ToPILImage(),
    transforms.Resize((28,28)),
    transforms.RandomRotation([45,135]),
    transforms.ToTensor()
    ])

below is the loop I’m running, where I’m getting the mentioned error

for label,image in trainloader:
    print(label,train)

below is the complete error I have received.

IndexError                                Traceback (most recent call last)
/tmp/ipykernel_41/1540740000.py in <module>
----> 1 for label,image in trainloader:
      2     print(label,train)

/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    559     def _next_data(self):
    560         index = self._next_index()  # may raise StopIteration
--> 561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:
    563             data = _utils.pin_memory.pin_memory(data)

/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/tmp/ipykernel_41/1814544681.py in __getitem__(self, idx)
     18         if self.augmentation is not None:
     19             image = self.augmentation(image)
---> 20         return torch.tensor(target[idx]),image[idx]

IndexError: index 1 is out of bounds for dimension 0 with size 1

note: code works fine without augmentation.

Check the shape of target as it seems to have a size of 1 in dim0, while you are trying to index it with idx = 1.

since i’m taking the .values of a dataframe its using the numpy array. and in def __getitem__(self,idx) if i return only return torch.tensor(target[idx]) it works fine. and also i have printed the shape of target it’s tensor([2000])

In that case I might have misinterpreted the error message and image is the culprit, which seems to have the aforementioned shape.

yes the problem is in the image after augmentation.
image shape before augmentation-> (2000, 784)->2000 rows of data passing from csv.
image shape after augmentation-> ([1, 28, 28])–>(28,28) is because in augmentation i’m using resize(28,28)
image shape after augmentation is what, get_item() function is returning. i don’t know why augmentation is returning [1,28,28] since my batch size is 2, i think it should be of shape[2,28,28]. i’m very new to pytorch can you please help me fix this?

The Dataset.__getitem__ method will get a single index by default and will thus load and process a single sample.
The DataLoader will pass batch_size indices one-by-one to the __getitem__ method and create the batch afterwards using its collate_fn.
So inside __getitem__ treat idx as a single scalar which should load a single sample and target.

@ptrblck thank you for patiently and quickly replying, I have made the below changes in my code but it still has the same issue.

class traindataset(Dataset):
    def __init__(self,data,train_end_idx,augmentation = None):
        '''
        data: pandas dataframe
        
        '''
        self.data=data
        self.augmentation=augmentation
        self.train_end=train_end_idx
        self.target=self.data.iloc[:self.train_end,1].values
        self.image=self.data.iloc[:self.train_end,2:].values #contains full data
        
    def __len__(self):
        return len(self.target);
    def __getitem__(self,idx):
        self.target=self.target
        image=self.image[idx].reshape(1,784) #only takes the selected index
        if self.augmentation is not None:
            image = self.augmentation(image)
        
        return torch.tensor(self.target[idx]),torch.tensor(image)

changes I have done are:

  1. in init I put my whole dataset rather than in getitem()->image=self.image[idx].reshape(1,784)
  2. in get item I’m passing only one image by specifying its index
    ->image=self.image[idx].reshape(1,784)

get_item is dealing with just one image[idx]

It is better to move augmentation to init and transform all images altogether, not 1by1 in get_item_, also check that your augmentation doesn’t change shape to (1, 28, 28) where 1 is the number of channels.

Check the shape of self.image in the __getitem__ and make sure you can properly index it in dim0.

@my3bikaht NIT: if your data augmentation applies random transformations, applying it in the __init__ would transform each sample only once and no random augmentation would be used.

1 Like