I’m trying to train a model which has many identical channels.
Let’s give a toy example:
I have one tensor divided in n tensors of the same shape and I want to respectively forward these n tensors to n different linear modules.
I concatenate the output and obtain one tensor.
The linear modules have the same shape (input shape and output shape) but don’t share parameters .
We may say that I have n isolated channels.
When I run the experiment I observe that my Gpu use only 20% of power.
Every linear module is very small.
I think that the machine lose time between each computation.
Is there anyway to compute the output of isolated linear modules at once ?
Something like multi channels linear module ?