Custom Loss doesn't update during training

I am using a PyTorch code to train over a custom loss function in an unsupervised setting.However, the loss doesn’t go down and stays the same over may epochs during the training phase. Please see the training code snippet below:

X = np.load(<data path>) #Load dataset which is a numpy array of N points with some dimension each.
num_samples, num_features = X.shape

gmm = GaussianMixture(n_components=num_classes, covariance_type='spherical')
gmm.fit(X)
z_gmm = gmm.predict(X)

R_gmm = gmm.predict_proba(X)
pre_R = Variable(torch.log(torch.from_numpy(R_gmm + 1e-8)).type(dtype), requires_grad=True)
R = torch.nn.functional.softmax(pre_R)

F = torch.stack(Variable(torch.from_numpy(X).type(dtype), requires_grad=True))
U = Variable(torch.from_numpy(gmm.means_).type(dtype), requires_grad=False)

z_pred = torch.max(R, 1)[1]

distances = torch.sum(((F.unsqueeze(1) - U) ** 2), dim=2)
custom_loss = torch.sum(R * distances) / num_samples

learning_rate = 1e-3
opt_train= torch.optim.Adam([pre_R], lr = learning_rate)
U = torch.div(torch.mm(torch.t(R), F), torch.sum(R, dim=0).unsqueeze(1)) #In place assignment with a formula over variables and hence no gradient update is needed.

for epoch in range(max_epochs+1):
    running_loss = 0.0
    for i in range(stepSize):

        # zero the parameter gradients
        opt_train.zero_grad()

        # forward + backward + optimize
        loss = custom_loss
        loss.backward(retain_graph=True)
        opt_train.step()
        running_loss += loss.data[0]

    if epoch % 25 == 0:
        print(epoch, loss.data[0]) # OR running_loss also gives the same values.
        running_loss = 0.0

O/P:
0 5.8993988037109375
25 5.8993988037109375
50 5.8993988037109375
75 5.8993988037109375
100 5.8993988037109375

Am I missing something in the training? I followed this example/tutorial.
Any help and pointers in this regard will be much appreciated.

PS: This question was also posted on StackOverfow.

what’s your train_var? It is the only thing being optimized and I don’t see it involved in this optimization step.

1 Like

Thanks for the prompt reply and pointing that out, Simon. train_var is pre_R and I edited it in the code snippet. It was a small editing mistake in the question posted here as I was ttrying to simplify it for anyone to understand.
Please note that the behavior of the loss is still the same.

You only calculated loss once… So essentially you only did one training step. PyTorch uses dynamic graph, unlike the static graphs in tf or theano. As you wrote:

PyTorch is doing computation on actual values, rather than symbols. So you will want to compute loss in each iteration. In each iteration, you should create a graph, and optimize it. So retain_graph shouldn’t be there in your case. In fact, you usually don’t need to set that in general. :slight_smile:Hope this helps.

1 Like