Greetings,
I don’t quite get how to use the Dataprallel wrapper to use multiple GPUs for my custom model.
In my particular case, I wrote my model with an evaluate
member function that already uses the device. In my torch framework, all my train routines expect the models to have this evaluate
function.
My model is a custom Resnet model, built on ResNet18 from TorchVision.
import torch.nn as nn
import torchvision
import torch
class ResNet(nn.Module):
def __init__(self, model, device):
super(ResNet, self).__init__()
self.resnet = model
self.device = device
if not isinstance(model, torchvision.models.ResNet):
raise ValueError("The given model is not an instance of resnet.")
def forward(self, features):
return self.resnet.forward(features)
def evaluate(self, data_loader):
self.eval()
loss = 0
correct = 0
criterion = nn.CrossEntropyLoss(reduction="sum")
with torch.no_grad():
for data, target in data_loader:
data, target = data.to(self.device), target.to(self.device)
output = self.forward(data)
loss += criterion(output, target).data.item() # sum up batch loss
correct += (target == torch.argmax(output, dim=1)).cpu().sum() # does cpu make sense?
accu = 100. * correct / len(data_loader.dataset)
return (loss, accu)
model = torchvision.models.resnet18() ## get predefined resnet.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
res_model = ResNet(model, device) ## instantiate custom model
res_model_wrapped = nn.DataParallel(res_model,device_ids = [0,3]) ## wrap in DataParallel for GPUS 0 and 3
res_model_wrapped.to(device)
### call train routine on `res_model_wrapped.module`
The code runs like this, but only one of the GPU is working, the one with index 0.
Any idea how I can make this work?