I am training CNNs, which are a few layers deep. My laptop has ten cpu cores, verified using os.cpu_count(), and I’m using ten workers for my dataloader which could be the issue.
I run my training through the command line with
python train.py
When i hit ctrl+z on the command line, to halt a training run, there were approx three thousand “torch shm manager” processes on the activity monitor on my mac, each consuming 0.5% CPU. With great lag, I managed to force quit all of them. When I use a keyboard interrupt, ctrl+c, to exit the training, this issue doesn’t occur.
Last night and the night before, i wasn’t able to fight the lag, causing my m1 pro mac display to freeze followed by a shut down. Is there a way for me to use ctrl+z and not blow up my computer?