How does one check how large a Tensor is in Mega Bytes?

pinocchio · August 9, 2019, 4:10pm

I want to predict how large a data set will be before I create it.

Is it possible to get those estimates from Pytorch internally?

e.g. if I had a model or a cifar10 image, how can I check in byes how big it is?

Nikronic · August 9, 2019, 4:54pm

Hi,

Basically, The vital parameter here is the size of your current dataset. For instance, let say you have 20GB images, and you have about 16GB RAM or a little more. Even if you have a small model, it is still impossible to load whole dataset into RAM. But for small datasets like CIFAR-10, you can do some calculation here.

You can do something like this : batch_size*channel_size*height*width = 28*3*32*32 = 86016 and let say we retain them in 64 bit numbers, then every input batch is about 688128 Bytes = 688kb. You can calculate number of parameters in your model too, because they all are matrices like the batches. Finally, you need some more memory for operation that needs more space such as all non-inplace operations. The autograd itself needs memory which are completely depends on your model and its parameters.

Finally, If you have any problem in loading whole dataset into memory, You can just use DataLoader and Dataset class to use lazy loading which means dataloader will load a batch every time and reads it from memory and when the model is processing the inputs (let assume on GPU), the dataloader will prepare the next batch.
You may have a little IO overload here (which only happens if loading data takes more time than processing in by the model), but the whole idea is great and actually, scientist who are working with large datasets like ImageNet (14 million) or Places365 (2 million), use lazy loading.

By the way, I used DataLoader and it works great. Fast and reliable.

Good luck