This is not multi-class classification! What I want to do, is create a predictor which simultaneously predicts 100 classification problems at once, each of which can have the labels 0,1,2. So the output size, for each batch, needs to be:

N x c x p

Where N is the batch size, c=100 is the number of classification problems and p=3 is the dimension of each classification problem. MultilabelMarginLoss only supports the N x p problem, where there is only one problem to be solved. In my case, each of the c classification problems are very similiar, so it makes sense to use the same net for each problem simultaneously.

How do I implement this?