So I’m trying to estimate a network with 4 real valued outputs. The network uses dropout on all the layers except the last one, the reason for this is so I can capture the model uncertainty (bayesian approximation). The problem is that the network has poor performance when it’s predicting 4 output but get adequate results when the output is only one.
I have done sanity checks i.e. checking the complexity of the network so it can overfit on a small set of observations. Also normalizing both input and output. The std of the outputs differ much.
Anyone has faced similar problem? I know this is not a direct question regarding the torch framework but hope I can get some answers. Trained with MSE loss and adam. The network is given by
class NETWORK(nn.Module): def __init__(self, layers = [40, 2048, 2048, 1024, 1024, 4], droprate = 0.1): super(NETWORK, self).__init__() self.p = droprate modules, n = , len(layers) - 1 for i in range(n): modules.append(nn.Linear(layers[i], layers[i + 1])) if i + 1 != n: modules.append(nn.Dropout(self.p)) modules.append(nn.ReLU()) self.net = nn.Sequential(*modules) def forward(self, x): return self.net(x)