Custom Loss doesn't update during training

vishalbhalla · February 24, 2018, 5:15pm

I am using a PyTorch code to train over a custom loss function in an unsupervised setting.However, the loss doesn’t go down and stays the same over may epochs during the training phase. Please see the training code snippet below:

X = np.load(<data path>) #Load dataset which is a numpy array of N points with some dimension each.
num_samples, num_features = X.shape

gmm = GaussianMixture(n_components=num_classes, covariance_type='spherical')
gmm.fit(X)
z_gmm = gmm.predict(X)

R_gmm = gmm.predict_proba(X)
pre_R = Variable(torch.log(torch.from_numpy(R_gmm + 1e-8)).type(dtype), requires_grad=True)
R = torch.nn.functional.softmax(pre_R)

F = torch.stack(Variable(torch.from_numpy(X).type(dtype), requires_grad=True))
U = Variable(torch.from_numpy(gmm.means_).type(dtype), requires_grad=False)

z_pred = torch.max(R, 1)[1]

distances = torch.sum(((F.unsqueeze(1) - U) ** 2), dim=2)
custom_loss = torch.sum(R * distances) / num_samples

learning_rate = 1e-3
opt_train= torch.optim.Adam([pre_R], lr = learning_rate)
U = torch.div(torch.mm(torch.t(R), F), torch.sum(R, dim=0).unsqueeze(1)) #In place assignment with a formula over variables and hence no gradient update is needed.

for epoch in range(max_epochs+1):
    running_loss = 0.0
    for i in range(stepSize):

        # zero the parameter gradients
        opt_train.zero_grad()

        # forward + backward + optimize
        loss = custom_loss
        loss.backward(retain_graph=True)
        opt_train.step()
        running_loss += loss.data[0]

    if epoch % 25 == 0:
        print(epoch, loss.data[0]) # OR running_loss also gives the same values.
        running_loss = 0.0

O/P:
0 5.8993988037109375
25 5.8993988037109375
50 5.8993988037109375
75 5.8993988037109375
100 5.8993988037109375

Am I missing something in the training? I followed this example/tutorial.
Any help and pointers in this regard will be much appreciated.

PS: This question was also posted on StackOverfow.

SimonW · February 24, 2018, 5:40pm

what’s your train_var? It is the only thing being optimized and I don’t see it involved in this optimization step.

vishalbhalla · February 24, 2018, 7:05pm

Thanks for the prompt reply and pointing that out, Simon. train_var is pre_R and I edited it in the code snippet. It was a small editing mistake in the question posted here as I was ttrying to simplify it for anyone to understand.
Please note that the behavior of the loss is still the same.

SimonW · February 24, 2018, 7:22pm

You only calculated loss once… So essentially you only did one training step. PyTorch uses dynamic graph, unlike the static graphs in tf or theano. As you wrote:

PyTorch is doing computation on actual values, rather than symbols. So you will want to compute loss in each iteration. In each iteration, you should create a graph, and optimize it. So retain_graph shouldn’t be there in your case. In fact, you usually don’t need to set that in general. Hope this helps.