Loss is NaN | vanishing/exploding gradients?

aizyuval · September 25, 2022, 8:37am

Hello, Im trying to implement the custom dataset mentioned here, into a model that can detect faces.

The problem: while training, the loss is nan. I dont know what to do.

Code:

class Net(nn.Module):
    def __init__(self):

        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 3)
        self.conv2 = nn.Conv2d(6, 9, 3)
        self.fc1 = nn.Linear(9*171*171, 1500)
        self.fc2 = nn.Linear(1500, 544)
        self.fc3 = nn.Linear(544, 136)


    def forward(self, x): 
        x= F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x= torch.flatten(x,1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


transformer = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
batchsize = 4
trainset  = projectdata(csv_file = 'face_landmarks.csv', root_dir = 'faces',transform=transformer)
train_loader = DataLoader(trainset, batch_size=batchsize, shuffle=True, num_workers=0)

optimizer = optim.SGD(net.parameters(), lr=0.0001, momentum=0.9) 
def cross_entropy(input, target):
return torch.mean(-torch.sum(target * torch.log(input), 1))

for epochs in range(2):
  running_loss = 0.0
  for i,data in enumerate(train_loader,0):
    input,labels = data

    optimizer.zero_grad()
    outputs = net(input)#136 PREDICTED annotations tensor x4 (batch size)
    outputs = outputs.reshape((batchsize,68,2)) #reshaped to fit the labels shape

    loss = cross_entropy(outputs,labels) #calculating the loss
    loss.backward() 
    optimizer.step()

Ive tried using leaky_relu, but with no success either.
Also tried to switch the optimizer to Adam.

Thanks!

ptrblck · September 25, 2022, 8:05pm

You custom cross_entropy function looks wrong, as it will apply torch.log on the model outputs directly, which are logits and can thus be negative as well as positive.
Negative values will create NaNs which would then explain your invalid loss values.
What’s the reason you are not using the correct and numerically stable nn.CrossEntropyLoss criterion?

aizyuval · September 28, 2022, 9:15am

Hi, thank you and sorry for the late response.
I thought I needed to use a custom cross_entropy in order to handle with 2 arrays.

Anyway, I switched it into nn.CrossEntropyLoss but the loss is NaN again.
It seems that the gradients often explode. sometimes loss is 27000, and then 50000, then NaN…