in ‘classical statistics’ I would possibly use a glm (family = negative binomial) for a regression with tabular data (x = count values, categorial values etc.) on percentage values as output (y). This time however, I would like to fit a multilayer ‘multivariate’ ANN (with PyTorch) for this task.
I found some threads, were the output (which means percentage values) is just treated as continuous (like % of fat in milk) and a regression ANN is run anyway, e.g.:
class multiple_regression_ANN(nn.Module): def __init__(self, num_features): super(multiple_regression_ANN, self).__init__() self.layer_1 = nn.Linear(number_of_features, some_number) self.layer_2 = nn.Linear(some_number, some_number) ... self.layer_n = nn.Linear(some_number, some_number) self.layer_out = nn.Linear(some_number, 1) self.relu = nn.ReLU()
Is that mathematically acceptable to run a regression ANN on percentage values, if I do not want to extrapolate with my ANN outside of my bounds (e.g. lower = 3.9%, upper = 30.2%)? Or how could I change the network in the hidden and output layer to be suitable for predicting percentage values?
Any help, code suggestions, alternative PyTorch ANN for this task or link is highly appreciated.