I have two questions regarding the design of Elastic：
- How can we differentiate between scaling down and a fault (i.e., whether the process automatically restarts after being killed) in Elastic?
- Why is it necessary to restart all processes in Elastic instead of making modifications while keeping the existing processes intact? What are the main factors considered in making this decision? What are the differences between automatic restart and manually loading the checkpoint to restart the job after an interruption?