nn.Modulelist in parallel

In my case, my modulelist size is large and need use ‘for’ to traverse. However the block in modulelist is small and execution fast so the gpu utils(on the same gpu) is low. All sub modules in modulelist have same input and structure. I had read the topic Parallel execution of modules in nn.ModuleList and Running multiple Modules in parallel. But my case have a bit different, if there have some mothed to improve gpu utils?
Here is a part of my code

self.client_list = nn.ModuleList()
for client_idx in range(self.client_num):
         Client(args.input_dim, args.conv_dim, args.residual_channel, args.skip_channel, args.end_channel,
 args.out_dim, args.timesteps, args.base_model))

self.server_model = Sever(args.num_nodes, args.conv_dim, args.residual_channel, args.cheb_k, args.embed_dim, "graph_conv_model")

# forward
encoder_list = []

for idx, client in enumerate(self.client_list):
      encoder, skip = client.encoder(x[:, :, idx, :, :])

# encoder_list = torch.cat(encoder_list, dim=2)
graph_out = self.server_model(encoder_list, self.adj_matrix)  #
1 Like