Custom loss function not decreasing

maroov · April 1, 2020, 8:08pm

Sorry for the previous (deleted) post, I accidentally created the post before writing (I still don’t know how, but whatever).

My model:

class Module1(nn.Module):

  def __init__(self):
    super(Module1, self).__init__()
    self.conv1 = nn.Conv2d(3, 32, 5, 1, 2)
    self.conv2 = nn.Conv2d(32, 64, 5, 1, 2)
    self.conv3 = nn.Conv2d(64, 128, 5, 1, 2)
    self.conv4 = nn.Conv2d(128, 128, 5, 1, 2)
    self.pool = nn.MaxPool2d(2, 2)
    self.fc = nn.Linear(128*8*8, 10)

  def forward(self, x):
    x = self.pool(F.relu(self.conv2(F.relu(self.conv1(x)))))
    x = self.pool(F.relu(self.conv4(F.relu(self.conv3(x)))))
    x = x.view(x.size(0), -1)
    return x

My loss:

def infoNCE(zt, ztk):
  ind = np.diag_indices(b_size)
  aux = torch.exp(ztk @ torch.t(zt))[ind[0], ind[1]]
  aux = aux / torch.sum(torch.exp(ztk @ torch.t(zt)), axis=0)
  val = -torch.sum(torch.log(aux))
  return val

My code for the training:

mod1 = Module1().to(device)
optimizer = optim.Adam(mod1.parameters(), lr=lr)

for epoch in trange(epochs):
  for i, data in enumerate(trainloader):
    xt = data[0][0].to(device)
    xtk = data[0][1].to(device)
    yt = data[1].to(device)  # ignore this
    optimizer.zero_grad()

    zt = mod1(xt)
    ztk = mod1(xtk)
    loss = infoNCE(zt, ztk)
    loss.backward()
    optimizer.step()

My loss over time:

Epoch 50	Loss = 44.24047088623047
Epoch 100	Loss = 44.24409484863281
Epoch 150	Loss = 44.193824768066406
Epoch 200	Loss = 44.24012756347656
Epoch 250	Loss = 44.26813507080078
Epoch 300	Loss = 44.28200149536133

charan_Vjy · April 2, 2020, 4:28am

It is first important to localize the error. You could try one of two things.

First try different learning rates, while simultaneously turning of all regularization.
Try using a standard loss function like the MSE (for regression) or the Cross Entropy (if classes are present). See if these loss fucntions decrease for a particular learning rate. If these losses do not decrease, it may indicate some underlying problem with the data or the way it was pre-processed.

braindotai · April 2, 2020, 5:40am

I think there might be a bug, as you are not using yt anywhere after defining it.

maroov · April 2, 2020, 10:24am

That’s not a bug, sorry, I should have explained. My model uses xt , which are patches of images, and xtk , which are ‘future’ patches or predictions, that is, a different random patch of the same image but with some displacement. Variable yt represents the labels but it was meant to be there for an evaluation of the accuracy when testing, the labels are not used in this custom loss. Variable yt can be ignored in this post.

braindotai · April 2, 2020, 10:26am

Try lowering lr, check if that helps.

maroov · April 2, 2020, 10:36am

I can’t believe it was as simple as that, thank you

braindotai · April 2, 2020, 10:39am

Yeah, sometimes Adam don’t work with moderately high learning rate, so in that case I usually use sgd with low learning rate(0.001 or 0.01, these lrs are lower enough for sgd) to avoid nan with high momentum like 0.8 or 0.9 for fast convergence. You can try this as well.

maroov · April 2, 2020, 10:48am

Oh that’s sweet because I was about to say that I was getting NaN after several iterations, thanks again!