A network has to trained with huge number ofdataset

What is the best way to train a network with large database contain billions of samples ??
When the same network is trained with a small dataset in GPU, its taking less time.
But when the datsamples are increased its taking too much time irrespective of same number of batch size, no_worker and same dataloader also.
I am not able to figure out, why the total batch size execution time is not matching with epoch time.
I am just following this link https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html.
MY network is simple DNN.