CUDA out of memory even with DataParallel

Double post from here.