First, I am first initializing model parameters, and predicting some input data on it.
pred = learner(adapt_data)
loss = loss_fn(pred, adapt_labels)
grad = torch.autograd.grad(loss, learner.parameters())
new_weight = list(map(lambda p: p[1] - 0.4*p[0], zip(grad, learner.parameters()))) # theta' = theta - alpha*grads
Now, I want to make use of these new_weights to predict on evaluation data without updating them in the model learner().
pred_val = learner(eval_data, new_weight)
But, this obviously won’t work as learner() takes only 1 argument (2, if we consider self). How should I work around this problem? One way I could think of was to make use of deepcopy(learner) to replicate the model and update the parameters of the new model, say new_learner(). But, the problem is, I can’t make use of loss.backward() or opt.step() as it will update the original model.
So, I tried something like this. I am not sure if this is correct, so do feel free to correct me in case it is a wrong practice.
I made a deepcopy of the original model. Let’s call it new_learner(). Similar to what I have done above, I computed the new weights in a variable new_weight. Now, I am made use of this to update without making use of opt.step():
with torch.no_grad():
for i, (name, params) in enumerate(new_learner.named_parameters()):
params.copy_(new_weight[i])
Now, once I evaluate this new_learner on eval_data, I am deleting it using del new_learner
Next, I am using the loss generated from the prediction to update the original model’s weights.