Suppose I have a Net like

x1 = subNet1(input)

x2 = subNet2(input)

x3 = subNet3(input)

output = x1 + x2 + x3

During training, not every subgraph will be used all the time. Maybe in step 1, output = x1 + x2, step 2, output = x1 + x3, step 3, output = x2, and so on.

My solution now is

output = mask1 * x1 + mask2 * x2 + mask *x3,

But this means all subgraph will calculate forward/backward, every subgraph is big so this takes up a lot of time and GPU memory.

Any one can offer some more concise solutions? Thanks a lot.

Instead of masking your `output`

you could add conditions to your calculations, e.g.:

```
if cond1:
output = subNet1(input) + subNet2(input)
elif cond2:
output = subNet1(input) + subnet3(input)
...
```