Training "never finishes" or system crashes using PyTorch - GPU has memory allocated but always has 0% utilization using DataLoader

So far everything you’ve described points to a data loading bottleneck (including the GPU util. peaks when data is available). For a general advice on fixing data loading bottlenecks look at this post.

1 Like