More CUDA memory usage with features calculated on the fly

Hi there,

I tried two methods to generate the same features using a custom dataloader:

  1. Prepare the features beforehand and save it to disk as pickle file, and load the pickle file directly in the “_getitem_” function.

  2. Generate the features on the fly in the “_getitem_” function, this process involves large matrix operation using NumPy.

Both of the final returned data are the same shape and type, the only difference is the first one is loaded from disk while the second one is calculated on the fly with some matrix operation. However, the CUDA memory usage of the second one will be about twice the first one, without decreasing the minibatch_size, I will got “RuntimeError: CUDA out of memory”. Does it mean that the dataloader will take up GPU during the feature process?

I tried to fix this issue by setting the pin_memory to False, delete some intermediate variables in the “_getitem_” function, but none of them worked.

Any insight appreciated!

This shouldn’t be the case, if your Dataset performs all operations on CPUTensors.
Could you run the data loading pipeline alone and check the GPU memory usage?
If the DataLoader approach shows some GPU usage, could you post its code here, so that we could take a look?

Thanks for the response. After further comparison, I finally found it was caused by a few dirty samples. It works fine after removing the dirty samples.