I have N networks with the same input size (but with different values) and the same output size.
I wanted to use the approach explained in [python - Run multiple models of an ensemble in parallel with PyTorch - Stack Overflow for parallelization] of N-heads neural network. If I understand correctly, using the layers:
nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads)
I parallelize the execution of N linear layers. My code is slightly different as the state does not have the same value but it has the same shape.
However, I have also an N-dimensional loss vector instead of a single number for all layers. I do not want that the loss of network 0 affect the weights of other networks. Is there any way to backpropagate the loss for each network separately?
Thanks in advance.