Change model parameters before back propagation

t170815518 · December 26, 2020, 3:30am

Hi, I am trying on a new incremental learning method, which requires the following step:
The paper is here: ActiveLink: Deep Active Learning for Link Prediction in Knowledge Graphs

(There are i windows of data during each iteration i.)

Temporarily update the model on the loss on Window i with the current model parameter.
Use this temporary parameter to accumulate the loss on Window 0-i.
The accumulated loss will be used to update the original parameter.

To achieve this, my code is below:

        model.train()
        model.zero_grad()

        feed_dict = data_iterator.windows[i]

        previous_param = model.state_dict().copy()
        _temporary_update(model, feed_dict, inner_optimizer)

        def _closure():
            """ Uses the current parameters of the model on the current and previous windows within the range, and
            returns the total loss on these windows.
            """
            model.zero_grad()
            total_loss = 0

            for window_dict in data_iterator.iter_from_list(i, window_limit):
                loss = model.loss(window_dict)
                loss.backward()
                total_loss += loss.item()

            model.load_state_dict(previous_param)
            return total_loss

        optimizer.step(closure=_closure)

My question is about the closure() function. In Step 3, it is actually updating the model with previous_param with total_loss, which has been accumulated in the graph. I wonder if it is the correct way to do so? Will the optimizer update the model with previous_param?

Thank you in advance.

ptrblck · January 6, 2021, 7:31am

I think your approach might work and should be similar to this small code snippet, it I understand it correctly:

# setup
model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
sd = copy.deepcopy(model.state_dict())

# first update
out = model(torch.randn(1, 1))
out.mean().backward()
optimizer.step()

# use new gradients on old state_dict
optimizer.zero_grad()
out = model(torch.randn(1, 1))
out.mean().backward()
model.load_state_dict(sd)
optimizer.step()

I’m not familiar with your use case, but note that you are updating the “old” parameters with the gradients calculated using the forward pass (and loss) from the “new” parameters.
Is this your use case indeed?

t170815518 · January 13, 2021, 4:22am

Thank you for the reply, and sorry for the late reply.
Yes, this is my use case where the “old” parameter is updated based on the gradients w.r.t another set of parameters.

I also checked other threads on the forum, where some people said the intermediary results won’t be changed with the model parameters. Is this the case?

ptrblck · January 13, 2021, 5:55am

Do you mean the intermediate activations, which were created during the forward pass?
If so, then no they won’t be changed by just upgrading the parameters, since a new forward pass would be needed.