How to know the size of the data after loading into data loader?

I loaded the dataset using data loader as follows:

data_loader = torchvision.datasets.ImageFolder('/content/drive/My Drive/Dataset/malimg_paper_dataset_imgs',
                                                 transform = torchvision.transforms.Compose([
                                                              torchvision.transforms.Resize((224, 224)),
                                                              torchvision.transforms.ToTensor(),
                                                              torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                                    std=[0.229, 0.224, 0.225])]))

The original dataset’s size is 1.1GB but in the data loader, I have applied resizing and normalization, now I want to know what will be the size of data that is loaded into the data loader. I can’t find anything related in the documentations. Thanks

You are not creating a DataLoader, but an ImageFolder which is a Dataset.
The ImageFolder dataset will lazily load and process each sample. If you wrap it into a DataLoader via:

dataset = ImageFolder(...)
loader = DataLoader(dataset, batch_size=..., num_workers=...)

each worker of the loader will load a complete batch and add it to a queue.
By default the prefetch_factor in the DataLoader is set to 2, which will load 2*num_workers batches.

1 Like

I did that, after that how can I know the size of loader where complete data is stored in the form of batches?

You can get the number of samples via len(dataset) and the number of batches via len(loader).

yes but I don’t want to know the length of samples or batches. I want to know the memory size those samples are holding.

If you are lazily loading the data, only 2*num_workers*batch_size will be loaded into the RAM. The rest will stay on the drive. You can check the shape of a batch size and calculate the needed RAM manually using the data type of your tensors.

1 Like