I am using a PyTorch code to train over a custom loss function in an unsupervised setting.However, the loss doesn’t go down and stays the same over may epochs during the training phase. Please see the training code snippet below:

```
X = np.load(<data path>) #Load dataset which is a numpy array of N points with some dimension each.
num_samples, num_features = X.shape
gmm = GaussianMixture(n_components=num_classes, covariance_type='spherical')
gmm.fit(X)
z_gmm = gmm.predict(X)
R_gmm = gmm.predict_proba(X)
pre_R = Variable(torch.log(torch.from_numpy(R_gmm + 1e-8)).type(dtype), requires_grad=True)
R = torch.nn.functional.softmax(pre_R)
F = torch.stack(Variable(torch.from_numpy(X).type(dtype), requires_grad=True))
U = Variable(torch.from_numpy(gmm.means_).type(dtype), requires_grad=False)
z_pred = torch.max(R, 1)[1]
distances = torch.sum(((F.unsqueeze(1) - U) ** 2), dim=2)
custom_loss = torch.sum(R * distances) / num_samples
learning_rate = 1e-3
opt_train= torch.optim.Adam([pre_R], lr = learning_rate)
U = torch.div(torch.mm(torch.t(R), F), torch.sum(R, dim=0).unsqueeze(1)) #In place assignment with a formula over variables and hence no gradient update is needed.
for epoch in range(max_epochs+1):
running_loss = 0.0
for i in range(stepSize):
# zero the parameter gradients
opt_train.zero_grad()
# forward + backward + optimize
loss = custom_loss
loss.backward(retain_graph=True)
opt_train.step()
running_loss += loss.data[0]
if epoch % 25 == 0:
print(epoch, loss.data[0]) # OR running_loss also gives the same values.
running_loss = 0.0
```

O/P:

0 5.8993988037109375

25 5.8993988037109375

50 5.8993988037109375

75 5.8993988037109375

100 5.8993988037109375

Am I missing something in the training? I followed this example/tutorial.

Any help and pointers in this regard will be much appreciated.

PS: This question was also posted on StackOverfow.