Multiple Target output architecture efficiency

While developing a model intended for training to optimize multiple targets at once I realized I could construct the output a couple of different ways and I’m curious if they are equivalent in terms of gradient calculation or if one may be better than the other it terms of convergence…

Assume, for simplicity I have 2 targets.

Option 1:

class MyModel( torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.input_layer = torch.nn.Linear( 1024, 16)
        self.out_layer_1 = torch.nn.Linear( 16, 1)
        self.out_layer_1 = torch.nn.Linear( 16, 1)

    def forward(self, feature_data):
        X = self.input_layer( feature_data)
        Out_1 = self.out_layer_1( X)
        Out_2 = self.out_layer_2( X)
        return (Out_1, Out_2)

Option 2:

class MyModel( torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.input_layer = torch.nn.Linear( 1024, 16)
        self.out_layer = torch.nn.Linear( 16, 2)

    def forward(self, feature_data):
        X = self.input_layer( feature_data)
        Out = self.out_layer( X)

        return Out

Obviously, with option 2 I would need to slice the output for my results. But I’m wondering if this makes a difference with regard to the optimizer and the backpropagation of loss.

I think you want to use somewhat regression loss.
Then, they are the same in the aspect of gradient.

I think the first option is much better than the second one because you can apply the different optimizer and learning rate to each targets with the first option.

1 Like