Caching parameters, and randomly using one of them for computing gradients

(Jayanth Reddy Regatti) #1

Hi, I am working on a problem where I need to cache the model parameters (weights) for the last k iterations. In the next iteration, my model needs to use the parameters (randomly picked from the cached values) to compute the gradients.

I tried the following.

model = torch.nn.Sequential(
    torch.nn.Linear(1000, 100),
    torch.nn.Linear(100, 10),
delayed_params = queue.pop()

However, I am unable to make the model use delayed_params for computing the gradients. Is there any way to solve this?

(Simon Wang) #2

Here’s a hacky class defn I put up together to solve a similar issue

How to switch parameters between two model
(Jayanth Reddy Regatti) #3

Thanks a lot for the link. I had trouble understanding some lines in your class definition. Specifically, I did not understand why these lines had to be written in forward_with_weights

        for (m, n), w in zip(self._module_names, old_ws):
            super(nn.Module, m).__setattr__(n, w)

From my understanding, you are using the new weights and computing the output, but why do we need to set those weights back to the old ones? And in backward, how will the new weights be used if you set them back to the old ones?

(Simon Wang) #4

The output is computed using new weights before the lines you referenced here I am setting back to old ones because I didn’t want the module weight to be changed before and after forward_with_weights