The program is strucked using four gpu while three gpu-util is 100% but one is 0%

My program was strucked using four gpu while three gpu-util is 100% but one is 0%, and all GPU Memory is full.It run well with 4 gpu before 53 epochs.however the program was strucked in the 53 epoch. when I tried to use keyboard to stop the program.the program was also strucked and only by "kill -9 " command can I stop the program.
the error report is below.
CProcess Process-853:
Process Process-852:
Process Process-854:
Process Process-851:
Process Process-849:
Process Process-855:
Process Process-850:
Process Process-856:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
Traceback (most recent call last):
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
Traceback (most recent call last):
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
KeyboardInterrupt
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
KeyboardInterrupt
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
Traceback (most recent call last):
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
KeyboardInterrupt
KeyboardInterrupt
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
KeyboardInterrupt
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 342, in get
with self._rlock:
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/synchronize.py”, line 96, in enter
return self._semlock.enter()
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/cjz/software/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 36, in _worker_loop
r = index_queue.get()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 343, in get
res = self._reader.recv_bytes()
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/connection.py”, line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/connection.py”, line 407, in _recv_bytes
buf = self._recv(4)
File “/home/cjz/software/anaconda3/lib/python3.6/multiprocessing/connection.py”, line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt

and when I use ctrl+c to try to stop the program,the process status is below
29353 0.8 9.1 134517912 6010228 pts/8 Sl+ 16:17 2:38 python train_test.py
29354 0.7 9.1 134502828 5994516 pts/8 Sl+ 16:17 2:18 python train_test.py
29355 0.8 9.2 134611644 6103368 pts/8 Sl+ 16:17 2:50 python train_test.py
29356 0.8 9.2 134573276 6064820 pts/8 Sl+ 16:17 2:42 python train_test.py
29360 0.8 9.1 134547024 6038864 pts/8 Sl+ 16:17 2:38 python train_test.py
29361 0.8 9.2 134584620 6076624 pts/8 Sl+ 16:17 2:40 python train_test.py
29362 0.8 9.1 134548908 6040852 pts/8 Sl+ 16:17 2:40 python train_test.py
29363 0.8 9.1 134553292 6044956 pts/8 Sl+ 16:17 2:38 python train_test.py
83660 0.0 0.0 13076 2588 pts/5 S+ 21:33 0:00 grep --color=auto python train_test.py
105011 95.8 14.0 133991260 9255084 pts/8 Rl+ 07:33 804:50 python train_test.py
Is that a deadlock problem?