Hello, all! I wanna know how to implement model parallel in PyTorch?

For example, I want to convolute single inputs with different convolution kernels, so I use nn.ModuleList() to handle the different convolution layers, then call them in forward() function. I found that PyTorch cannot do the different convolution operations parallelly. I print the compute time, I think it may compute them sequentially, but it causes the low(20%) gpu load.

```
class Conv1DBank(nn.Module):
def __init__(self, k, input_dim, output_dim):
super(Conv1DBank, self).__init__()
self.input_dim = input_dim
self.k = k
self.conv1d_bank = nn.ModuleList()
self.conv1d_bank_2 = nn.ModuleList()
self.create_bank(k, input_dim, output_dim)
def create_bank(self, k, input_dim, output_dim):
for i in xrange(k):
tmp = nn.Conv1d(in_channels=input_dim, out_channels=64, kernel_size=64, stride=1)
tmp1 = nn.Conv1d(in_channels=64, out_channels=output_dim, kernel_size=64, stride=1)
self.conv1d_bank.append(tmp)
self.conv1d_bank_2.append(tmp1)
def forward(self, input_tensors):
out = list()
for i in xrange(self.k):
tmp = self.conv1d_bank[i](input_tensors)
tmp1 = self.conv1d_bank_2[i](tmp)
out.append(tmp1)
return torch.sum(torch.cat(out,1))
```

So, anyone know how to compute different convolution operations parallelly? Thank you!