Parallel execution of different modules with different input


I have about 8k one dimensional vectors of length 128, and corresponded to each vector I have a linear layer. I would like to pass each vector through its corresponded linear layer, and then concatenate their outputs, is there a way which I could take advantage of gpu, and do them in parallel ?

There is a similar question here but I am not sure how to adjust that to my problem.

If you have different “linear layers,” (one layer for each input) then this sounds like it would be a batched-matrix multiply:
torch.bmm — PyTorch 1.10.0 documentation
b = 8000,
n = 1,
m = 128,
p = output dim of linear layer.
It should also be expressible as an einsum:
torch.einsum — PyTorch 1.10.0 documentation

1 Like

Thank you so much I will try it.

my apologies this may be a dumb question; I am not sure how to create a matrix of modules, should I just copy the parameters into a tensor of size (b×m×p)?
If I only use torch.tensor to create a new matrix of (b×m×p) dimension, then how to initialize and set the bias of that?

Here is what I tried, will back propagation be applied on output_layers later in training?

self.output_layers = torch.rand([8000, 128, 2], requires_grad=True)
torch.nn.init.normal_(self.output_layers, mean=0, std=5e-3)

Generally parallelization across modules is difficult, so you may want to write your custom module that does everything in a single layer, as it looks like you are currently doing. What you have looks OK, and if you want bias you can add a bias tensor to your custom module.

1 Like