Tried to allocate 784.00 MiB (GPU 0; 23.99 GiB total capacity; 7.15 GiB already allocated; 13.69 GiB free; 7.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation

I have come across several posts which discuss the same error, but in all posts I have seen that the free memory is less than that PyTorch is trying to allocate.

In my case the free memory is showing more than 13 GB, while it is trying to allocate 784MB.

I have tried using a lower batch size (128) down from 256, but I received the following error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 23.99 GiB total capacity; 10.11 GiB already allocated; 11.45 GiB free; 10.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I don’t seem to understand the main issue here.

UPDATE:: Decreasing the number of workers from 8 to 1 solved the issue. Later increased to 4, and still working.

Later increased to 8, which worked this time.

Again failed to run in a later run. Had to decrease the number of workers again.