Hello,

I have N networks with the same input size (but with different values) and the same output size.

I wanted to use the approach explained in [python - Run multiple models of an ensemble in parallel with PyTorch - Stack Overflow for parallelization] of N-heads neural network. If I understand correctly, using the layers:

```
nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads)
```

I parallelize the execution of N linear layers. My code is slightly different as the state does not have the same value but it has the same shape.

However, I have also an N-dimensional loss vector instead of a single number for all layers. I do not want that the loss of network 0 affect the weights of other networks. Is there any way to backpropagate the loss for each network separately?

Thanks in advance.