Hi, I have a network composed of several disjoint subnetworks taking disjoint subsets of data features as input. Right now, my code is written as:
xs = [linear(x[:,i,:]) for i,linear in enumerate(self.linears)] x = torch.stack(xs,dim=2)
in which self.linears is the ModuleList of sub-networks, and x is my input tensor. My network applies one subnetwork to one slice of the input tensor. My question is: Is there any way of doing this without using a for loop, as for loop is not automatically parallelized in GPU, and I believe this is currently the bottleneck for the run speed.