So I’m trying to estimate a network with 4 real valued outputs. The network uses dropout on all the layers except the last one, the reason for this is so I can capture the model uncertainty (bayesian approximation). The problem is that the network has poor performance when it’s predicting 4 output but get adequate results when the output is only one.
I have done sanity checks i.e. checking the complexity of the network so it can overfit on a small set of observations. Also normalizing both input and output. The std of the outputs differ much.
Anyone has faced similar problem? I know this is not a direct question regarding the torch framework but hope I can get some answers. Trained with MSE loss and adam. The network is given by
class NETWORK(nn.Module):
def __init__(self, layers = [40, 2048, 2048, 1024, 1024, 4], droprate = 0.1):
super(NETWORK, self).__init__()
self.p = droprate
modules, n = [], len(layers) - 1
for i in range(n):
modules.append(nn.Linear(layers[i], layers[i + 1]))
if i + 1 != n:
modules.append(nn.Dropout(self.p))
modules.append(nn.ReLU())
self.net = nn.Sequential(*modules)
def forward(self, x):
return self.net(x)