I want to compute the hidden state to hidden state Jacobian of a RNN with a single layer, evaluated at a number of points (i.e. over a batch), and I want to confirm that I’m doing this correctly.
The hidden state has shape (batch size, num layers=1, hidden state size)
. I first compute the unit basis vectors as
num_basis_vectors = num_layers * hidden_size
unit_basis_vectors = torch.eye(n=num_basis_vectors).reshape(
num_basis_vectors, num_layers, hidden_size)
I then loop over the unit basis vectors (as above):
jacobian_components = []
for unit_vector in unit_basis_vectors:
jacobian_component = torch.autograd.grad(
outputs=rnn_output_hidden_states, # shape (batch size, num layers=1, rnn hidden state size)
inputs=rnn_input_hidden_states, # shape (batch size, num layers=1, rnn hidden state size)
grad_outputs=torch.stack([unit_vector] * batch_size), # repeat batch_size times
retain_graph=True)[0]
jacobian_component = torch.mean(jacobian_component, dim=(1,))
jacobian_components.append(jacobian_component)
jacobian = torch.stack(jacobian_components, dim=1)
Is this correct? How would I go about testing the correctness?