DataParallel on pretrained models

Hi,
I took a look at https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py, and I checked that DataParallel is not used in the code of resnet in Pytorch.
I want to train a student model(resnet18) with a pretrained teacher model(resnet50). Based on the code I see I did not see a way to use multiple GPUs in this training without changing the model of resnet. Is there a mode or a trick that I could use to run the training on multiple GPUs?