A question about whether pytorch could cause system freeze or not

Hello, everyone. I used pytorch from last year. It is all okey until recently, I found that sometimes if I start to train models, my system often froze when the model was loaded or when the model need to be evaluated during the process of training. It’s very troublesome and what I can do is to do forced physical shutdown. My pytorch’s version is 0.5.0
Is there anyone meet the same situation? Or is there some smart solution to avoid this awkward problem?
Any advice could be appreciated. Thank you.

Do you have a script to reproduce this issue or does it occur completely at random?
Also, could you try to update to the latest stable version or the nightly builds?

Did you monitor the memory usage, e.g. did you fill up the RAM and swap?
Was anything else strange, e.g. high CPU, mainboard, etc. temperature?

Thanks for your reply.
I found that this problem occurs at random, and sometimes it could happened when the system just started, so I don’t think the temperature could be the reason. My RAM is 64GB, I think that is enough to run the codes. I installed v1.0 at first, but I found the same codes run slower than v0.5, so I changed back to v0.5. I don’t understand the meaning of “the nightly builds”.

I did notice this random freeze in 1.0, but I couldn’t reproduce this as well.

What OS are you on? This could be related to multiprocessing on windows, which may cause a freeze, if you don’t enable proper freeze_support (at least, that’s what I experienced).

I used ubuntu16.04, is there something like “freeze_support” in ubuntu system?

No, this is not necessary for other systems than windows.

Can you maybe show a minimum working example? Even if we can’t reproduce it deterministically, we could still have a look at it and maybe have some guess, what’s causing this.

Thanks for your advice. I am running the code from https://github.com/AaronLeong/BigGAN-pytorch with LSUN dataset. The problem often occur when the training or evaluation starts. But I think this code is correct, because if the system freeze don’t occur, the code could work well.