Dataloaders and Cuda management

rwightman · August 21, 2019, 6:31pm

@ilyes @ptrblck There is a size of model + data below which you’re going to have a really hard time utilizing your GPU 100% with the combined overhead of Python, the framework, and getting data to/from the GPU, etc. I’ve seen a number of these sorts of posts where that is an issue.

If you can fit the whole dataset in GPU memory, you don’t have CPU augmentations, you might get some more utilization by preprocessing the data, moving to the GPU, and manually indexing that GPU tensor for the batches instead of using a dataloader.

One other thing, try setting pin_memory=False and see how it compares. I’ve had nothing but issues with it on. Recently re-confirmed looking into another issue. Enabling pin_memory choked up all of my CPU cores with 30-40% utilization in the kernel (some sort of synchronization contention?) .

EDIT: I posted a pretty picture of the CPU usage with pin_memory=True in another thread CPU usage extremely high