What should I do if GPU is starving when loading NumPy array?

crissallan · August 23, 2019, 5:54am

My data loader load a large amount of Numpy arrays(about 150 GB) as input data from disk, each .npy array contains a [32571025] NumPy array.
The batch_size is 8, num_workers is 8(I can’t load a large batch, because it will raise a CUDA out of memory error). The training speed is same as when I use a batch_size of 4. So I check the NVIDIA-smi and found that GPU util is always at a low level. Is there any solution to speed up my training?
I think loading large .npy files slow slow down the process, because when I load a small size of Numpy array(eg.2*16384 size) things would be fine.
Is there any ways to load Numpy array like data faster?
Many thanks!

crissallan · August 23, 2019, 5:56am

Sorry, each .npy array contains a (3 ,257, 1025 )size NumPy array

alex.veuthey · August 23, 2019, 6:10am

Are you loading images? Both storing and loading of images in uncompressed form (all values for all pixels are encoded) are very inefficient…

You can preprocess your images and save them in .jpg or .png format, then load them using PIL.Image.open(filename).

Note that some other parts of the training can bottleneck the execution, but as you use raw numpy arrays, I would start there and see if training is still slow after the modifications.

crissallan · August 23, 2019, 8:18am

Many, thanks! My data are not images, they are spectrograms obtained from a audio signal, by implementing Fourier transforms on the audio signal

alex.veuthey · August 23, 2019, 8:34am

Interesting! I think you can still apply the approach of saving the spectrogram results as images in compressed files, then loading them using PIL. I don’t know how much accuracy you would lose by compressing that data, but it’s always a tradeoff…

That’s the approach taken in this blog post in the beginning. The second part of the post shows on-the-fly FFT generation during training, which might also be a valid approach, if you have the original files available?

I don’t know how fast the on-the-fly FFT generation is, but the compressed image method is guaranteed to be faster than loading raw numpy arrays.

crissallan · August 26, 2019, 5:22pm

Thank you very much! I tried your solution, it works perfectly! And I have another question: which could be faster for loading if I save an .wav file as a numpy array or just load the wav file directly?

alex.veuthey · August 27, 2019, 6:31am

Unfortunately I can’t tell you for sure… You would have to try both and see!

Deeply · August 29, 2019, 11:45pm

I do not advice compressing the data, even via PIL, as there might be some issues even if you store the PIL in F format (float). However, if you insist on using this compression, ie saving then loading the data, you would need to make sure that the data you are saving equal the original ones; you can pick one image data, compress it, then load it and compare it to the original one and if you a get zero difference, you will be fine. However, you can try first to use original wave, then the find the spectrograms, test whatever modell you are using, if things work well, try the other compression way. Numpy arrays? I don’t think the GPU utilization problem you are having is due to loading Numpy arrays, for the GPU to work faster, you can try increasing the batch size, hopefully your GPU does not run out of memory; if you have a 12GB GPU you should be fine. Which one is faster? do some experiments to measure the execution time of whatever you want to use. Finally, you can consider using torch instead of Numpy.