Hi,
I am thinking of sginal handler, signal.signal()m to catch a certain range of signals like SIGINT etc. The idea is that this handler would save out a checkpoint of the current state of a model when a signal event happens that would eventually lead to termination of the current process. The concern is that this would happen, say, half-way through backpropagation leaving part of the weights and biases not updated/in an inconsistent state.
So I wonder if that is a sensible strategy at all. What can I expect to happen in case of such events? Is there a difference between CPU and GPU? How would I write out checkpoints if a user presses Ctrl-C?
Many thanks.