Variable number of output branches

I have a text recognition task, where I want to have a variable number of output branches on the neural network architecture depending on the input image. How should I structure the network along with the loss and optimizers to achieve this task

There are many possible ways of structuring your model, and it will probably depend on your application.
One common approach is to return the results in a list (one for each branch), and have the loss accept the list.
But your might need some experimentation in order to find what works best for you.

If I understand correctly, I will have to write custom loss functions to do this right? The standard loss functions don’t seem to accept lists.

Yes you have to write a custom loss function but you could simply do it like this:

class CustomLoss(torch.nn.Module):
    def __init__(self, loss_fn=MSELoss):
        self.loss_fn = loss_fn()

    def forward(self, pred_list, target_list):
        assert len(pred_list) == len(target_list)
        value = 0
        for _pred, _target in zip(pred_list, target_list):
            value = value + self.loss_fn(_pred, _target)

        return value