I assume you have downloaded gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip from Login – Cityscapes Dataset and extract to ~/BigDatas/cityscapes which has gtFine and leftImage8bit.
Reproduce steps:
git clone https://github.com/fyu/drn.git
cd drn
python3 datasets/cityscapes/prepare_data.py ~/BigDatas/cityscapes/gtFine
cp datasets/cityscapes/create_lists.sh ~/BigDatas/cityscapes
cp datasets/cityscapes/info.json ~/BigDatas/cityscapes
cd ~/BigDatas/cityscapes
sh create_lists.sh
chmod u+x segment.py
Then CUDA_VISIBLE_DEVICES=0,1 python3 segment.py train -d ~/BigDatas/cityscapes -c 19 -s 896 --arch drn_d_22 --batch-size 32 --epochs 250 --lr 0.01 --momentum 0.9 --step 100 works very well. However, CUDA_VISIBLE_DEVICES=0,1 CUDA_LAUNCH_BLOCKING=1 python3 segment.py train -d ~/BigDatas/cityscapes -c 19 -s 896 --arch drn_d_22 --batch-size 32 --epochs 250 --lr 0.01 --momentum 0.9 --step 100 will freeze like below:
This env variable will synchronize the kernel launches, so that the stacktrace would point to the right line of code in case a kernel is hitting an internal assert. Otherwise, due to the async execution of CUDA kernels, the errors might be reported in another line of code since the CPU could run ahead.