Hi guys, I’ve just tried to run my network with two gpus, and here is my code:

```
if use_gpu:
model = nn.DataParallel(model)
model = model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
def Tensor2Variable_CEL(input, label):
input = Variable(input).cuda().float()
label = Variable(label).cuda().long()
return input, label
def train_model(model, criterion, optimizer, scheduler, num_epochs):
model.train(True)
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch+1, num_epochs))
print('-' * 10)
since = time.time()
scheduler.step()
for (input, label) in train_loader:
optimizer.zero_grad()
# prepare datas
input, label = Tensor2Variable_CEL(input, label)
# run the model
output = model(input)
loss = criterion(output, label)
loss.backward()
optimizer.step()
test_model(model, criterion)
time_diff = time.time() - since
print('epoch complete in {:0.6f}'.format(time_diff))
print()
```

But what shocked me was that it’s much slower than before I didn’t use nn.DataParallel(model), i.e. just use one gpu in one epoch. Without model = nn.DataParallel(model), every epoch takes about 15 seconds while with it takes about 30 seconds. Except the model = nn.DataParallel(model), I didn’t change anything on my network or training process. Is there anything I did wrong? Thanks in advance.