Hello, I was wondering if I could get any pointers about applying nn.DataParallel to model that I’ve written using a GRU network. The code snippet of this model is as follows:

```
class GRUClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, layers_dim, batch_size):
super(GRUClassifier, self).__init__()
self.hidden_dim = hidden_dim
self.layers_dim = layers_dim
self.batch_size = batch_size
self.gru = nn.GRU(input_dim, hidden_dim, layers_dim)
self.hidden = self.init_hidden()
self.fcn = nn.Linear(self.hidden_dim * self.layers_dim, 1)
def init_hidden(self):
return (Variable(torch.randn(self.layers_dim, self.batch_size, self.hidden_dim))).cuda()
def forward(self, input):
self.batch_size = input.size()[1]
self.hidden = self.init_hidden()
gru_out, self.hidden = self.gru(input, self.hidden)
hidden_out = self.hidden.view(self.batch_size, self.layers_dim * self.hidden_dim)
x = self.fcn(hidden_out)
x = F.sigmoid(x)
return x
```

I am applying the DataParallel as follows:

```
SEQ_LENGTH = 2501
MELS = 256
HIDDEN_DIM = 128
model = GRUClassifier(input_dim=MELS, hidden_dim = HIDDEN_DIM, layers_dim = 4, batch_size=64)
nn.DataParallel(model.cuda())
```

however according to nvidia-smi only 1 GPU is currently being utilised. My machine has 2xTitan X and I’m running PyTorch in an nvidia-docker container. I’ve tried mult-GPU training on a Convolutional Neural Network using a similar method and that seems to work correctly.