How to compute RNN Hidden-to-Hidden Jacobian in Pytorch?

RylanSchaeffer · May 5, 2020, 3:28pm

I want to compute the hidden state to hidden state Jacobian of a RNN with a single layer, evaluated at a number of points (i.e. over a batch), and I want to confirm that I’m doing this correctly.

The hidden state has shape (batch size, num layers=1, hidden state size). I first compute the unit basis vectors as

    num_basis_vectors = num_layers * hidden_size
    unit_basis_vectors = torch.eye(n=num_basis_vectors).reshape(
        num_basis_vectors, num_layers, hidden_size)

I then loop over the unit basis vectors (as above):

        jacobian_components = []
        for unit_vector in unit_basis_vectors:
            jacobian_component = torch.autograd.grad(
                outputs=rnn_output_hidden_states,  # shape (batch size, num layers=1, rnn hidden state size) 
                inputs=rnn_input_hidden_states,  # shape (batch size, num layers=1, rnn hidden state size)
                grad_outputs=torch.stack([unit_vector] * batch_size),  # repeat batch_size times
                retain_graph=True)[0]
            jacobian_component = torch.mean(jacobian_component, dim=(1,))
            jacobian_components.append(jacobian_component)
        jacobian = torch.stack(jacobian_components, dim=1)

Is this correct? How would I go about testing the correctness?