If I set the num_workers to be 8 (or 4), then after hundreds of iterations, I got error “RuntimeError: DataLoader worker (pid 56901) is killed by signal: Killed.”
I searched related topics, some suggested “num_workers=0”. So I set it to be 0 but after hundreds of iterations, I still got killed. This time, only “Killed” is given. No other hints.
The top output and the training output are as follows: