How to speed up if each instance in a batch needs to be processed separately?


I have this implementation for a LSTM network, I am iterating through each step using the LSTMCell. For each step, I have to transform the input data with dimension [batch_size, embedding_size] to [batch_size, hidden_size]. However, for each instance in this batch, I am applying a different Linear layer based on the actual instance. Thus, I can not just use batch operation and I have to process each instance individually. However, it is very slow. Wonder there is any way to speed this up? Thank you very much!

Below is my code:

self.m1 and self.m2 are shared by all instance, so this is not the problem. The problem is this self.encoders. It is an array of Linear layers. Based on the actual instance, the corresponding Linear layer has to be used.

        for i in range(self.num_steps):
            # modality 1
            input_data = input_1[:,i]
            m2_data = input_2[:, i]
            current_skills = routers_info[:, i]

            # Routing
            batch_size = 32
            # current_pred = []
            tmp_data = torch.FloatTensor(batch_size, self.hidden_size)
            for b in range(batch_size):

                out_1 = self.m1(input_data[b])
                out_2 = self.m2(m2_data[b])
                fused =, out_2), 0)
                out = self.encoders[current_skills[b]](fused)
                tmp_data[b] = out

            input_data = tmp_data
            h_t, c_t = self.lstm1(input_data, (h_t, c_t))

Sure, just get rid of “array of layers” (and then of loop). As linear layer only does fused.matmul(W) + B, create (batched) W and B instead - nn.Embedding may be appropriate for that.

I should add that matmul will probably need to be replaced with torch.bmm (shapes (B,1,in) @ (B,in,out) = (B,1,out))

Thanks Alex, I will definitely try that