DataParallel on pretrained models

I took a look at, and I checked that DataParallel is not used in the code of resnet in Pytorch.
I want to train a student model(resnet18) with a pretrained teacher model(resnet50). Based on the code I see I did not see a way to use multiple GPUs in this training without changing the model of resnet. Is there a mode or a trick that I could use to run the training on multiple GPUs?