Hi,

I have this implementation for a LSTM network, I am iterating through each step using the LSTMCell. For each step, I have to transform the input data with dimension [batch_size, embedding_size] to [batch_size, hidden_size]. However, for each instance in this batch, I am applying a different Linear layer based on the actual instance. Thus, I can not just use batch operation and I have to process each instance individually. However, it is very slow. Wonder there is any way to speed this up? Thank you very much!

Below is my code:

self.m1 and self.m2 are shared by all instance, so this is not the problem. The problem is this self.encoders. It is an array of Linear layers. Based on the actual instance, the corresponding Linear layer has to be used.

```
for i in range(self.num_steps):
# modality 1
input_data = input_1[:,i]
m2_data = input_2[:, i]
current_skills = routers_info[:, i]
# Routing
batch_size = 32
# current_pred = []
tmp_data = torch.FloatTensor(batch_size, self.hidden_size)
tmp_data.zero_()
for b in range(batch_size):
out_1 = self.m1(input_data[b])
out_2 = self.m2(m2_data[b])
fused = torch.cat((out_1, out_2), 0)
out = self.encoders[current_skills[b]](fused)
tmp_data[b] = out
input_data = tmp_data
h_t, c_t = self.lstm1(input_data, (h_t, c_t))
```