I am trying to implement the a mixture of expert layer, similar to the one described in:
and already discussed in this thread. By reading some threads about the topic I found the following sentence.
“The MoE (Mixture of Experts Layer) is trained using back-propagation. The Gating Network outputs an (artificially made) sparse vector that acts as a chooser of which experts to consult. More than one expert can be consulted at once.”
I am not sure if the experts here are pre-trained or not. I am not really sure if training involves just the Gating Network or the full layer (Gating network with experts). (If anybody is familiar with this model, please explain this to me if possible)
At any case, I have built 3 neural network
(model1; model2 and model3) in which I’ve already trained and tuned and I want to include these to the MoE layer to improve the overall accuracy.
The code has the following class
"""Call a Sparsely gated mixture of experts layer with 1-layer Feed-Forward networks as experts. Args: input_size: integer - size of the input output_size: integer - size of the input num_experts: an integer - number of experts hidden_size: an integer - hidden size of the experts noisy_gating: a boolean k: an integer - how many experts to use for each batch element """ def __init__(self, input_size, output_size, num_experts, hidden_size, noisy_gating=True, k=4): super(MoE, self).__init__() self.noisy_gating = noisy_gating self.num_experts = num_experts self.output_size = output_size self.input_size = input_size self.hidden_size = hidden_size self.k = k # instantiate experts self.experts = nn.ModuleList([MLP(self.input_size, self.output_size, self.hidden_size) for i in range(self.num_experts)]) self.w_gate = nn.Parameter(torch.zeros(input_size, num_experts), requires_grad=True) self.w_noise = nn.Parameter(torch.zeros(input_size, num_experts), requires_grad=True) self.softplus = nn.Softplus() self.softmax = nn.Softmax(1) self.normal = Normal(torch.tensor([0.0]), torch.tensor([1.0])) assert(self.k <= self.num_experts)
I changed the line
self.experts = nn.ModuleList([MLP(self.input_size, self.output_size, self.hidden_size) for i in range(self.num_experts)])
with my pretrained models
self.experts = nn.ModuleList([model1, model2, model3])
But I don’t know if this is enough. I know my question is kind of vague/or complicated. But at this point I am lost and frustrated, any kind of information is really helpful for me at this poin