Does Dataparallel support packed LSTM?

Please give me some example if Data Parallel support packed LSTM. Or how can I manually do the dataparallel using .cuda(). For example if I have a data x in size of (Batch, Length, Dim).

should I do like this:
x_0 = x[:B/2,:,:].cuda(0)
x_1 = x[B/2:,:,:].cuda(1)

return torch.cat([self.lstm(x_0), self.lstm(x_1)], dim = 0)

or should I also keep two copy of self.lstm in different GPU? If I keep two copy of them how can I make sure they are using the same weights?