I’m confused about how to use DataParallel properly over multiple GPU’s because it seems like it’s distributing along the wrong dimension (code works fine using only single GPU).
The model using dim=0
in Dataparallel, batch_size=32
and 8 GPUs is:
import torch
import torch.nn as nn
from torch.autograd import Variable
class StepRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers): #
super().__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
self.encoder = nn.Embedding(input_size, hidden_size)
self.rnn = nn.GRU(input_size=hidden_size, \
hidden_size=hidden_size,\
num_layers=num_layers)
self.decoder = nn.Linear(hidden_size, output_size)
def forward(self, input, hidden):
batch_size = input.size(0)
encoded = self.encoder(input)
output, hidden = self.rnn(encoded.view(1, batch_size, -1), hidden)
output = self.decoder(output.view(batch_size, -1))
return output, hidden
def init_hidden(self, batch_size):
return Variable(torch.zeros(self.num_layers, batch_size, self.hidden_size))
decoder = StepRNN(
input_size=100,
hidden_size=64,
output_size=100,
num_layers=1)
decoder_dist = nn.DataParallel(decoder, device_ids=[0,1,2,3,4,5,6,7], dim=0)
decoder_dist.cuda()
batch_size = 32
hidden = decoder.init_hidden(batch_size).cuda()
input_ = Variable(torch.LongTensor(batch_size, 10)).cuda()
target = Variable(torch.LongTensor(batch_size, 10)).cuda()
for c in range(10):
decoder_dist(input_[:,c].contiguous(), hidden) #RuntimeError: Expected hidden size (1, 4, 64), got (1, 32, 64)
The result is RuntimeError: Expected hidden size (1, 4, 64), got (1, 32, 64)
. It makes sense that its expecting a 32/8 hidden size but it seems to be passing the full batch. What am I missing? Full traceback here.
With dim=1
I get RuntimeError: invalid argument 2: out of range
. Full trace here.
Interestingly, if I open an Ipython session and run the code once, I get the runtime error above. But, if I run it again unchanged, I get a different error: RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCTensorCopy.c:18
. This seems pretty consistent but not sure why the error would change with the exact same code.
I found another question where the issue is related to batch_first=True
so taking dim=0
by default doesn’t work. But I’m using the default batch_first=False
.