AttributeError: 'CustomDataSet' object has no attribute 'size'

I have images in a folder. So, I made a custom dataset to load the images.

class CustomDataSet(Dataset):
    def __init__(self, main_dir, transform=None):
        self.main_dir = main_dir
        self.transform = transform
        self.all_imgs = os.listdir(main_dir)

    def __len__(self):
        return len(self.all_imgs)

    def __getitem__(self, idx):
        img_loc = os.path.join(self.main_dir, self.all_imgs[idx])
        image ="RGB")
        tensor_image = self.transform(image)
        return tensor_image

Then, I am trying to make TensorDataset from image_tensor and labels_tensor. Finally, I want to make a DataLoader using this TensorDataset

def load_train_data():
    # Some code to the directory path etc.

    # doing some preprocessing
    transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    #loading images using CustomDataSet
    train_data_tensor = CustomDataSet(train_data_dir, transform=transform)
    # Making TnesorDataset using both train_data_tensor and train_label_price_tensor
    train_tensor = TensorDataset(train_data_tensor, train_label_price_tensor)
    # Making Train Dataloader
    train_loader = DataLoader(train_tensor, batch_size= 1, num_workers= 2, shuffle= True)

But, I am getting an error

Traceback (most recent call last):
  File "", line 86, in <module>
  File "", line 75, in load_train_data
    train_tensor = TensorDataset(train_data_tensor, train_label_price_tensor)
  File "/home/akib/.local/lib/python3.8/site-packages/torch/utils/data/", line 158, in __init__
    assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
  File "/home/akib/.local/lib/python3.8/site-packages/torch/utils/data/", line 158, in <genexpr>
    assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
AttributeError: 'CustomDataSet' object has no attribute 'size'

What wrong with my code?

TensorDataset expects tensors as inputs not another Dataset.
You could wrap the train_data_tensor (your CustomDataSet) directly in a DataLoader.

1 Like

@ptrblck, here I am doing transform. Does it mean, images are converted into tensor, not the CustomDataSet

The passed transform will be applied in CustomDataSet.__getitem__ in this line of code:

tensor_image = self.transform(image)

and will thus be applied on the PIL.Image and tensor_image should be a tensor in this case.

Hello @ptrblck ,

I have 100 images in a folder, but I am getting only one image location from the __getitem__.

def __init__(self, main_dir, label_full, transform):
  #Reading path and doing some operations
          #sorting images based on the name
        self.all_imgs = sorted(os.listdir(main_dir), key= lambda x : int(x.split("_")[0]))

def __getitem__(self, idx):
         img_loc = os.path.join(self.main_dir, self.all_imgs[idx])
        print(img_loc) # printing only 1 image location.
        image ="RGB")

Could you tell me why this is happening or I am doing some mistakes?

Could you check the length of dataset.all_imgs via print(len(dataset.all_imgs)) and make sure more than one image is found?

@ptrblck, yes, it is showing the length is 100!

    def __len__(self):
        return len(self.all_imgs)

I also checked it grom __getitem__ and the result is same 100

Are you using a custom sampler or collate_fn?
If the length of the dataset is given as 100, this code should work:

for idx in range(100):
    batch = dataset[idx]

while using a DataLoader only returns a single sample?

loader = DataLoader(dataset, batch_size=1)
for idx, batch in enumerate(loader):


I am using custom sampler. I am getting this error (before and after applying your code).

ValueError: 1 is not in range

Full error post is here.

I am not sure what is happening! Trying to create my Custom Dataset but not working at all!

In that case, make sure it’s passing len(dataset) indices to the Dataset.__getitem__ method.

How can I do it? Would you mind to give me any hints?

It depends on your custom sampler and I don’t know how you’ve implemented it.
You could compare your implementation to the SequentialSampler, which returns iter(range(len(self.data_source))) and thus indices in the range of the dataset.

1 Like