Some issues for "DataParallel"

The recommended way to leverage multiple GPUs in the same box is “DataParallel”. After I read the code of “DataParallel” class(, I notice that “parallel_apply” does the forward job for network on different device.
However, “parallel_apply” is implemented with “multithreading”(Please refer for more details).
If all the ops of a network can be run on GPU, such approach works well because GPU has asynchronous execution mode. If only parts of these ops of a network can be run on GPU and the others must be run on CPU (e.g GPU memory limitation), does “DataParallel” still works? The fact is that “multithreading” can not make use of multi-core for parallel task due to GIL limitation. Maybe “multiprocessing” is a better choice for such case ?