### System explanation

What I’m trying to accomplish is a system where you have two models; a classifier, and what can call a “gradient_model”.

The goal of the classifier is learn features from a training dataset as normal.

The goal of the gradient_model is to learn to filter/aggregate/transform the gradients from the classifier so that the classifier performs well on the validation dataset; explicitly without having any feature information from the validation set.

### Colab

### Algorithm explanation

- Get train_batch
- Compute loss for classifier on train_batch
- Compute gradients w.r.t classifer weights and the loss for every environment

This should result in a tensor of size [num_envs, …] - Update the classifier’s weights by applying the gradient model

new_weight = old_weight - gradient_model(gradients).squeeze()

The gradient_model converts the [num_envs, …] tensor to [1, …] which we then squeeze. - Get val_batch
- Compute val_loss for new_classifier on val_batch
- Inorder to update the gradient_model,
*i think*we need to:

a) compute the gradient_of the of the new model and val_loss

b) as the new model depends on the gradient_model, we can then compute the gradient of the gradient_model and a) - Update the gradient model

### The problem

In order for me to train the gradient model, I think I need to train it using the gradient from the updated weights of the classifier.

Using torchviz, I can see that the gradient breaks between step 4 where we created new classifier weights and step 5 where we want to use the new classifier weights on a new forward pass.

If possible, I would like the gradient tape to continue onto a new forward pass of the model.