nn.DataParellel with CPU-GPU mix

My particular usecase is to train a model with some modules on CPU and some on GPU.
I wrap my model with
model = nn.DataParallel(model) instead of model = nn.DataParallel(model).cuda() since I don’t want the entire model on GPU.
But doing so results in the following error:
TypeError: Broadcast function not implemented for CPU tensors in the forward propagation of model. If I use without nn.DataParallel, it works just fine

I’m pretty sure DataParallel is for GPUs.

Edit: Found an example. You should be able to get it to run on both apparently.