Hi, I am trying on a new incremental learning method, which requires the following step:

The paper is here: ActiveLink: Deep Active Learning for Link Prediction in Knowledge Graphs

(There are i windows of data during each iteration i.)

- Temporarily update the model on the loss on Window i with the current model parameter.
- Use this temporary parameter to accumulate the loss on Window 0-i.
- The accumulated loss will be used to update the original parameter.

To achieve this, my code is below:

```
model.train()
model.zero_grad()
feed_dict = data_iterator.windows[i]
previous_param = model.state_dict().copy()
_temporary_update(model, feed_dict, inner_optimizer)
def _closure():
""" Uses the current parameters of the model on the current and previous windows within the range, and
returns the total loss on these windows.
"""
model.zero_grad()
total_loss = 0
for window_dict in data_iterator.iter_from_list(i, window_limit):
loss = model.loss(window_dict)
loss.backward()
total_loss += loss.item()
model.load_state_dict(previous_param)
return total_loss
optimizer.step(closure=_closure)
```

My question is about the `closure()`

function. In Step 3, it is actually updating the model with `previous_param`

with `total_loss`

, which has been accumulated in the graph. I wonder if it is the correct way to do so? Will the optimizer update the model with `previous_param`

?

Thank you in advance.