DDP processes go into D status (disk sleep (uninterruptible))

Interesting, when I move the entire imagenet folder to SSD, not all the processes go into D (still 1 or 4 go into D), but the iter speed is normal at 50 seconds. It seems that something is wrong with HDD io.

same solve method with Strange behavior in Pytorch

But I think this is not a good way because ImageNet is 140g, while my SDD is only 2T. Also, this server is new, I bought it 4 months ago. I don’t think it is something wrong with HDD

1 Like