Training process keep getting killed

tagynedlrb · March 23, 2021, 6:44am

Here’s where I got the code, and run custom training.

I ran my code “python3 train.py --model_def config/yolov3-custom.cfg --data_config config/custom.data”,
but the process is keep getting killed.
Mac OS, python 3.7 is my current spac(no GPU, only by CPU)
According to following article(parallel processing - Pytorch : W ParallelNative.cpp:206 - Stack Overflow),
I’ve handled my multiprocessing error issue, and edited my [n_cpu = 0].
Also, I even resized my data 1/100, which is only 8KB each, and put only 3 of 'em.
I don’t get what’s wrong with it…

---- mAP 0.0
Training Epoch 4: 100%|███████████████████████████| 1/1 [00:09<00:00, 9.19s/it]

---- Evaluating Model ----
Detecting objects: 0%| | 0/1 [00:00<?, ?it/s]
zsh: killed python3 train.py --model_def config/yolov3-custom.cfg --data_config

ptrblck · March 23, 2021, 11:40pm

Could you try to run it via gdb and see, if you would get any proper error?

gdb --args python train.py --model....
...
run
...
bt

tagynedlrb · March 25, 2021, 12:59pm

I got trouble trying using gdb, but fortunately I solved my problem!
I still don’t know what was the main problem, but I guess I didn’t insert plenty of datasets…
or else just by my mistake on valid.txt & train.txt data…Thanks for your help!