In my Finetune models , I wanna parallel my model, in multi-gpus, my code is shown below:
class FinetuneModel(nn.Module):
def __init__(self, pretrained_model, ngpu = opt.gpuids):
self.ngpu = ngpu
super(FinetuneModel, self).__init__()
self.features = pretrained_model
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(512 * 4 * 4, 2048),
....
))
def forward(self, x):
gpuids = None
if self.ngpu:
gpuids = range(self.ngpu)
features = self.features(x)#self.features has already implemented data parallel
return nn.parallel.data_parallel(self.classifier, features, device_ids=gpuids)
as far as I know , when doing
features = self.features(x)#self.features.forward has already implemented data parallel
score = nn.parallel.data_parallel(self.classifier, features, device_ids = gpuids)
GPU first broadcast batch data to GPU0 and GPU1 , after executing self.features
, pytorch copy result to GPU0. when executing self.classifier
, pytorch again broadcast data to multi-gpus.
is there a pytorchic
way that could reduce data-copy like this
score = nn.parallel.data_parallel([self.features,self.classifier], features, device_ids = gpuids)
which only does one broadcast