Saliency analysis for RNN predictions

aabati · April 17, 2023, 12:43am

I am working with a timeseries dataset in which I want to identify if a specific event occured (for each timeseries sequence the event only happens once). I trained a recurrent neural network model (with LSTM layers) that gives me for each input frame (sample) a binary output indicating if the RNN thinks the event happened or not.

Now I want to do a saliency analysis to find out how far in advance the trained model uses the input frames to make a positive prediction.

I came up with the following code, but I am not sure if my implementation is correct:

data, pred_label = get_data_and_model_prediction()
event_happened_at = np.argmax(pred_label)

# Zero-out gradients
for param in model.parameters():
    param.grad = None

# Start from zero state for each sequence
h = model.get_new_empty_state(requires_grad=True)

# Step 1: Forward propagation
subsequence = data[None, 0:event_happened_at + 1, :]  # Example shape: 1 x 100 x 2 (N x T x output units)
subsequence = torch.from_numpy(subsequence).to(device=device).float()
subsequence.requires_grad = True
prediction, new_state = model(subsequence, h)
probability = F.softmax(prediction, dim=2)

# Step 2: Backward propagation (the output layer has 2 units, where the unit at index 1 indicates if the event happened)
probability[0, :, 1].backward(torch.ones(subsequence.size(1), device=device))
gradients = subsequence.grad.detach().cpu().numpy()

For me, the tricky part is the first line in step 2. I would like to backpropagate through time to find the influence of each feature from all prior input frames on that particular prediction that the event happened. But I have the feeling that I am not using the backward function correctly.