Understanding load_state_dict() effect on computational graph

I am trying to replicate the algo outlined in the MAML paper:

Pseudo code:

The idea is to compute a second order gradient. While computing the first order gradient is straightforward, computing the second order gradient as mentioned in this paper is a little bit tricky. The pseudo code should give a good understanding.