ReLU activation outputting all negatives

aj-ai · July 19, 2019, 2:09pm

I’ve been having some trouble with my neural net, as it’s outputting all negatives for all 200k test set data points. I know that this isn’t possible with ReLU activation, which is why I’m so concerned.

Here’s my neural net class:

class Net(nn.Module):
   def __init__(self, *, dims:dict):
       super(Net, self).__init__()

       # Layer 1
       self.fc_1 = nn.Linear(in_features = dims['input_dim'], out_features=dims['layer_1'])
       self.actv_1 = nn.ReLU()

       # Layer 2
       self.fc_2 = nn.Linear(in_features = dims['layer_1'], out_features = dims['layer_2'])
       self.actv_2 = nn.ReLU()
       self.bn_2 = nn.BatchNorm1d(num_features = dims['layer_2'])

       # Layer 3
       self.fc_3 = nn.Linear(in_features = dims['layer_2'], out_features = dims['layer_3'])
       self.actv_3 = nn.ReLU()
       self.bn_3 = nn.BatchNorm1d(num_features=dims['layer_3'])

       # initialize
       xavier_normal_(self.fc_1.weight)
       xavier_normal_(self.fc_2.weight)
       xavier_normal_(self.fc_3.weight)

       self.fc_output = nn.Linear(in_features = dims['layer_3'], out_features = dims['output_dim'])

   def forward(self, x):
       x = self.actv_1(self.fc_1(x))
       x = self.actv_2(self.bn_2(self.fc_2(x)))
       x = self.actv_3(self.bn_3(self.fc_3(x)))
       return self.fc_output(x)

Is this something wrong with my weight initialization? If so, what other options should I use?

Thanks in advance!

ptrblck · July 19, 2019, 2:56pm

Since the last layer (self.fc_output) doesn’t use any activation function, its output values are unbound and can take negative as well as positive values.

If you would like to clip the output to a certain range, you could apply relu again on the last layer or use .clamp(min=0.0).

aj-ai · July 22, 2019, 12:35pm

Thanks for your response!

I’m also having another issue though - after applying ReLU to the output layer, I’m still getting all 0 values (as expected) as all the predictions from nn.Linear have been negative. Thus my confusion matrix has exclusively true negatives and false negatives, and no real predictions.

Is there something wrong with my model? It doesn’t look like there’s anything wrong but I’m not sure.

Thanks for your time!