Custom loss function outside the realm of torch

Hello,

I’ve only used pytorch in a very vanilla way so far, so please bear with me :slight_smile:

I’m building a model to accept a one-hot encoded string and generate a corresponding string.
To score, I do the following:

  1. I convert the generated output into a string (pred_str), - to do this I need to detach
  2. Calculate the alignment score of the real string (real_str) against itself to get a float (lets call it yy)
  3. Calculate the alignment score of the pred_str against real_str to get a float (lets call it yh)
  4. I then want to calculate MSE using yy and yh
  5. Take the gradient and backpropagate starting from the last layer of my model

This is what I have so far, but it appears that the computation graph is broken as my weights are not updating in my model:

My custom loss:

class MyLoss(torch.autograd.Function):
    @staticmethod
    def forward(ctx, seq_pred, seq_real, encoder):
        seq_pred = encoder.inverse_transform(seq_pred.detach().numpy()) # STEP 1
        seq_pred = ''.join(seq_pred[0])
        seq_real = seq_real

        y_real = np.mean([al.score for al in aligner.align(seq_real,seq_real)]) # STEP 2

        y_pred = np.mean([al.score for al in aligner.align(seq_pred,seq_real)]) # STEP 3

        y_pred = Variable(torch.tensor(y_pred), )
        y_real = Variable(torch.tensor(y_real), )

        ctx.save_for_backward(y_real, y_pred)
        return (y_pred - y_real).pow(2) # STEP 4

    @staticmethod
    def backward(ctx, grad_output):
        yy, yy_pred = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input = -(2.0*(yy_pred - yy))/489 # STEP 5
        #print(grad_input)
        return grad_input.repeat(489).view(1,489), None, None

Other stuff:

class MyModel(nn.Module):
    
    def __init__(self, X_dim, h1_dim, h2_dim):
        super(MyModel, self).__init__()
        self.fc1 = nn.Linear(X_dim, h1_dim)
        self.fc2 = nn.Linear(h1_dim, h2_dim)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        x = x.view(x.shape[0],-1)
        x = F.relu(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        return x
    
    
# training parameters
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
X_dim = X_train.shape[1]
h1_dim = 200
h2_dim = 489
model = MyModel(X_dim, h1_dim, h2_dim).to(device)

n_eps = 5
criterion = MyLoss.apply
optimizer = torch.optim.Adam(G.parameters(), lr=0.0001)

Hi,

First, given that this is the last piece of your network, you don’t have to wrap it into a custom autograd.Function, you can also just compute the loss and gradient as a regular python functions and then do:

output = model(input)
loss, grad = my_loss(output, target)
opt.zero_grad()
output.backward(grad)
opt.step()

If you really want to use a custom Function because it is cleaner, I don’t think i can see anything wrong with this one.
Are you sure that the gradients it compute as grad_input are properly non-zero?