Torch.autograd.grad is returning `None` when calculating derivative wrt time

I have an LSTM model that takes 3 sequences of temperature data and outputs the next sequence.

input => [array([0.20408163, 0.40816327, 0.6122449 ]), 
          array([0.40816327, 0.6122449 , 0.81632653])]
 output=> [tensor(0.81632653, dtype=torch.float64),
           tensor(0.91667510, dtype=torch.float64)]

Now,I want to combine this LSTM model with a Physics-Informed Neural Network (PINN) based on Newton’s Law of Cooling. The idea is to predict temperature using LSTM, then calculate the derivative of the predicted temperature with respect to time to incorporate the physics law into the loss function.

However, when I try to compute the gradient of the LSTM output with respect to time (t), the gradient returned is None. I’m not sure if I’m using torch.autograd correctly for this purpose.

Here is a simplified version of my code:

import torch
import torch.nn as nn

def create_lstm_model(input_size, hidden_size, num_layers, output_size):
    class LSTMModel(nn.Module):
        def __init__(self, input_size, hidden_size, num_layers, output_size):
            super(LSTMModel, self).__init__()
            self.hidden_size = hidden_size
            self.num_layers = num_layers
            self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
            self.fc = nn.Linear(hidden_size, output_size)

        def forward(self, x):
            h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
            c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
            out, _ = self.lstm(x, (h0, c0))
            out = self.fc(out[:, -1, :])
            return out


    return LSTMModel(input_size, hidden_size, num_layers, output_size)

def physics_loss_autograd(outputs, time_step):
    """
    Compute the physics-informed loss using autograd to get dT/dt.
    """
    # Compute dT/dt using autograd
    dT_dt = torch.autograd.grad(outputs, time_step, grad_outputs=torch.ones_like(outputs), create_graph=True)[0]

    # Newton's law of cooling: dT/dt = -k(T - T_ambient)
    residual = dT_dt + k * (outputs - T_ambient)

    # Physics loss is the L2 norm of the residual
    physics_loss = torch.mean(residual**2)

    return physics_loss


t = torch.arange(0,100) 
input_size =  1
hidden_size = 64
num_layers = 1
output_size = 1

# Create an instance of the LSTMModel using the function
model = create_lstm_model(input_size, hidden_size, num_layers, output_size)

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)


num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.float(), targets.float()  # Convert to float
        # print(inputs.shape)
        optimizer.zero_grad()
        outputs = model(inputs)


        
        data_loss = criterion(outputs, targets)
        
        # DO SOMETHING LIKE
        phys_loss = physics_loss_autograd(outputs, t)
        loss = data_loss + phys_loss

        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    if (epoch+1) % 20 == 0:
        print(f'Epoch {epoch+1}/{num_epochs}, Loss: {total_loss/len(train_loader)}')

Has anyone worked on a similar problem? Any guidance on how to compute the temporal derivative of the LSTM output would be really helpful!

Additional Information:

  • The time t is the timestep, not explicitly included as an input to the LSTM. I tried to include the timestep as additional feature to the input but its not calculating the derivative as well.
  • I’m still new to working with PINNs and combining physics loss with LSTM.

The problem arises when I try to compute the physics-informed loss via torch.autograd.grad:

dT_dt = torch.autograd.grad(outputs, t, grad_outputs=torch.ones_like(outputs), create_graph=True)[0]

This returns None for dT_dt. I suspect there’s an issue with how I’m handling the time_step or the autograd setup, but I’m not sure what exactly is going wrong.

If t is never used in the forward pass, it won’t have any effect on the output of the model and thus will also won’t have any valid gradients.