This is a technical proof of concept prior to experimenting with an MoE RNN.
Essentially, I want to run two RNNs in parallel on the same input sequence. The caveat is that the number of parallel RNNs to run is unknown - thus the for loop.
class pRNN(nn.Module): def __init__(self, input_size=0, hidden_size=0, num_layers=1, bidirectional=False, dropout=0, subunit_count=1): super().__init__() self.subunit_count = subunit_count subunit_size = math.ceil(hidden_size / subunit_count) self.rnn =  hidden_size_remaining = hidden_size for i in range(self.subunit_count): self.rnn.append(nn.GRU(input_size=input_size, hidden_size=min(hidden_size_remaining, subunit_size), num_layers=num_layers, bidirectional=bidirectional, dropout=dropout)) self.rnn = nn.ModuleList(self.rnn) def forward(self, x): out = None for i in range(self.subunit_count): if out is None: out, hidden = self.rnn[i](x) else: out2 = self.rnn[i](x) out = torch.cat((out, out2), dim=-1) hidden = torch.cat((hidden, out2), dim=-1) return out, hidden
Unfortunately, this is maybe the fifth failed attempt to come up with a technique that runs in parallel - no matter what I try it runs sequentially (including reducing the already minimal batch size of 128 by the number of RNNs). I am running on a single V100.
Any suggestions are appreciated.