I loaded the dataset using data loader as follows:
data_loader = torchvision.datasets.ImageFolder('/content/drive/My Drive/Dataset/malimg_paper_dataset_imgs',
transform = torchvision.transforms.Compose([
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])]))
The original dataset’s size is 1.1GB but in the data loader, I have applied resizing and normalization, now I want to know what will be the size of data that is loaded into the data loader. I can’t find anything related in the documentations. Thanks
You are not creating a
DataLoader, but an
ImageFolder which is a
ImageFolder dataset will lazily load and process each sample. If you wrap it into a
dataset = ImageFolder(...)
loader = DataLoader(dataset, batch_size=..., num_workers=...)
each worker of the
loader will load a complete batch and add it to a queue.
By default the
prefetch_factor in the
DataLoader is set to
2, which will load
I did that, after that how can I know the size of loader where complete data is stored in the form of batches?
You can get the number of samples via
len(dataset) and the number of batches via
yes but I don’t want to know the length of samples or batches. I want to know the memory size those samples are holding.
If you are lazily loading the data, only
2*num_workers*batch_size will be loaded into the RAM. The rest will stay on the drive. You can check the shape of a batch size and calculate the needed RAM manually using the data type of your tensors.