Gradient Doesn't Compute Backward

I want to make a custom loss function of MSE Loss by doing GMM computation for the result.
The MSE Loss will be computed using the GMM result and target value.

Here is my code

class StyleLoss(nn.Module):

def __init__(self, target_feature, ncomp, initial_mus, initial_covs, initial_priors):

    super(StyleLoss, self).__init__()
    self.gmm = GMM(ncomp, initial_mus, initial_covs, initial_priors)
    self.target, self.log_likelihood_target = self.gmm.inference(target_feature)
    self.target = torch.tensor(self.target, dtype=torch.float).to(device)

def forward(self, ctx, input_style):

    input_style = input_style.detach().cpu().numpy()
    image_batch, image_channels, image_height, image_width = input_style.shape
    input_style_reshaped = np.reshape(input_style, (-1, image_channels))

    self.input_gmm, self.log_likelihood_input = self.gmm.inference(input_style_reshaped)
    self.input_gmm = torch.tensor(self.input_gmm, dtype=torch.float).to(device)
    self.loss = F.mse_loss(self.input_gmm, self.target)

    self.loss = torch.tensor(self.loss, requires_grad=True, dtype=torch.float).to(device)
    input_style = torch.tensor(input_style, dtype=torch.float).to(device)

    return input_style

I have read many discussion forum posts but still doesn’t get the proper method to make this custom loss function, because when I do it this, the gradient will not be calculated. Hope everyone can help me. Bunch of thanks before !

You are detaching the tensors from the computation graph by recreating new tensors in particular here:

input_style = input_style.detach().cpu().numpy()
...
self.loss = torch.tensor(self.loss, requires_grad=False, dtype=torch.float).to(device)
input_style = torch.tensor(input_style, dtype=torch.float).to(device)

Try to use the output of the functions directly without recreating tensors.
If you need to change the data type, use .float() or to().

I have used the direct output from the F.mse_loss function, but the result doesn’t have grad_fn= when I print out the output. Will it be a problem?

def forward(self, input_style):

    # GMM = GMM(input)

    input_style = input_style.detach().numpy()
    image_batch, image_channels, image_height, image_width = input_style.shape
    input_style_reshaped = np.reshape(input_style, (-1, image_channels))
    
    self.input_gmm, self.log_likelihood_input = self.gmm.inference(input_style_reshaped)
    self.input_gmm = torch.tensor(self.input_gmm, dtype=torch.float).to(device)
    self.loss = F.mse_loss(self.input_gmm, self.target).to(device)
    input_style = torch.tensor(input_style, dtype=torch.float).to(device)

    print('self.loss : ', self.loss)
    return input_style

Print out result
self.loss : tensor(0.0224)
self.loss : tensor(0.0195)

self.input_gmm was also detached in the line before.
The general rule is, as long as you use PyTorch functions, don’t detach the tensors (via recreating new tensors, calling .detach() or item()), Autograd will be able to track the computation graph and calculate the gradients.

Sorry I haven’t mentioned this before. The result of
input_gmm, self.log_likelihood_input = self.gmm.inference(input_style_reshaped)

is numpy array for both the input_gmm and self.log_likelihood.

Therefore, when I do the
F.mse_loss(self.input_gmm, self.target).to(device)

,the computation can’t be calculated because one of them is in numpy array type. How to counter that?

You could try to use PyTorch methods, if the numpy equivalents are available, or you would have to write a custom autograd.Function with the forward and backward pass manually.

Could you post the method so that we can have a look, which numpy ops you are using?

Did you mean the
self.gmm.inference() function looks like? It basically code that is only using pure numpy

class GMM:

def __init__(self, ncomp, initial_mus, initial_covs, initial_priors):

    self.ncomp = ncomp
    self.mus = np.asarray(initial_mus)
    self.covs = np.asarray(initial_covs)
    self.priors = np.asarray(initial_priors)

def inference(self, datas): # E-step
    unnormalized_probs = []
    if type(datas) == torch.Tensor:
        datas = datas.cpu().numpy()

    for i in range(self.ncomp):
        mu, cov, prior = self.mus[i, :], self.covs[i, :, :], self.priors[i]
        # multivariate_normal.pdf is a function from Scipy to calculate
        # multivariate normal distribution because our data is 3 dimension
       unnormalized_prob = prior * multivariate_normal.pdf(datas, mean=mu, cov=cov, allow_singular=True)

        unnormalized_probs.append(np.expand_dims(unnormalized_prob, -1))

    preds = np.concatenate(unnormalized_probs, axis=1)
    log_likelihood = np.sum(preds, axis=1)
    log_likelihood = np.sum(np.log(log_likelihood))
    preds = preds / np.sum(preds, axis=1, keepdims=True)

    return np.asarray(preds), log_likelihood

I also try to follow the tutorial of making forward and backward pass manually, but I’m confused, in my case which part I should save into the ctx.save_for_backward function and how can I make the backward function.

You would have to derive the backward formula manually and save all tensors needed in the backward calculations in ctx.

However, just by skimming through the code, it seems you could swap the numpy calls for their PyTorch equivalent.

Thank you for your explanation, and I’m thinking to make it in PyTorch code.

That would be the easiest way.
Let us know, if you get stuck somewhere or have trouble finding the matching PyTorch function for your numpy code.