FC layer(linear) output and train loss increasing

Hello All,
I am using ResNet3D architecture with some changes:

    x = self.lastConv2(x)
    x = self.relu(x)
    x = self.lastConv1(x)
    x = self.relu(x)
    x = self.lastConv0(x)
    x = self.relu(x)
    x = x.view(x.size(0), -1)
    x = self.dropout(x) #20%
    x = self.fc(x)
    x = torch.tanh(x)
    return x #torch.tanh(x)

I am facing a strange issue like the output of the FC layer is increasing in every epoch which is increasing the train MSE loss.

And the output of tanh ultimately becomes very clipped at +1 to -1 which I understand that tanh will be clipped at +1 and -1 since the inputs are very high.

I am using Xavier weight initialization and all the inputs are normalized from -1 to +1 scale.

def weights_init(m):
    if isinstance(m, nn.modules.conv._ConvNd): #nn.Conv3d
        init.xavier_uniform_(m.weight.data, gain = np.sqrt(2.0))
        # m.bias.data.fill_(0) #todo: if bias = False then comment out the line
        # torch.nn.init.xavier_uniform_(m.bias.data)
    elif isinstance(m, nn.modules.batchnorm._BatchNorm):
        # print('OK')
        m.weight.data.normal_(mean=1.0, std=0.02)
        # m.bias.data.fill_(0) #todo: if bias = False then comment out the line
    elif isinstance(m, nn.Linear):
        # m.weight.data.normal_(0.0, 0.02)
        # init.xavier_uniform_(m.weight.data)
        y = 1/np.sqrt(m.in_features)
        m.weight.data.uniform_(-y, y)
        m.bias.data.fill_(0) #0.01

And for the network and loss:

    Net = ResNet(Bottleneck, [1, 1, 1, 1]).to(device)
    Net.apply(weights_init)
    Optimizer = optim.Adam(Net.parameters(), lr=0.0005)
    Criterion = nn.MSELoss()
    Criterion = Criterion.cuda()

And about normalizing:

from skimage.exposure import rescale_intensity
MRimage = rescale_intensity(MRimage, in_range=(MRimage.min(), MRimage.max()),out_range=(-1,1))
target = rescale_intensity(target, in_range=(target.min(), target.max()), out_range=(-1,1))

What could be happening inside??

What range are your targets using?
If the targets are “classification” targets, i.e. zeros and ones, I would expect the model to try to output very high activations to saturate the tanh and this match the target values.

Hi @ptrblck
The targets are in range of -1 to +1.
target = rescale_intensity(target, in_range=(target.min(), target.max()), out_range=(-1, 1))
So the targets are not simply 0 or 1. Rather they are some values.

Hi @ptrblck
I think I have figured the mishap I was making.
I did not use Batchnorm after the lastConv layers and without normalization, I guess that is the reason the outputs were rocketing.

        x = self.bnl2(self.lastConv2(x))
        x = self.relu(x)
        x = self.bnl1(self.lastConv1(x))
        x = self.relu(x)
        x = self.bnl0(self.lastConv0(x))
        x = self.relu(x)
        x = x.view(x.size(0), -1)
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.dropout(x) %40
        x = self.fc2(x)
        return x

image

Though I guess the network overfits right after epoch: 2 but this is kinda normal problem.

Let me know your thoughts… I will provide more details if you need.
Thanks