I’m porting some Lua/Torch code over to PyTorch, which used ParallelCriterion (to apply L1 loss separately to 2 different layers of a stacked network, applying final+intermediate supervision).
I see that the legacy nn API provided by PyTorch provides a ParallelCriterion implementation – but what is the “correct” / non-legacy way to do this?