import torch.nn as nn
from torch import optim
import torch.nn.functional as F
def forward(self, x):
x = self.linear1(x)
x = nn.functional.relu(x)
x = self.linear2(x)
In that case you could create a “fixed” and a “trainable” tensor in a custom linear layer and concatenate them in each forward pass. This would make sure that only the trainable part gets valid gradient and the parameter updates. This post gives you an example of such a layer and replaces it via torch.fx in another model.
do you think it is feasible to improve the code of the post so that training the model with fixed tensors will be faster than training the original model withoud fixed ones(since I want to use it for optimization reasons)? it seems to me that the code is already optimal…
I know my question is a bit general, but I want an expert opinion to know if it’s worth digging deeper on this,
I think the fastest approach would depend on your actual use case and I would suggest to profile the discussed methods.
In particular, using the torch.cat approach with the out argument could be beneficial if only a small part of the actual gradients should be calculated. On the other hand, you might not see huge benefits of avoiding the computation of the “frozen” gradients if the actual workload is tiny. Since you are concerned about the optimal performance, a profile would be the right approach to see which method would add the most overhead.
Is it possible to simply replace the selected layer with the customized one which will freeze the desired parameters ( using setattr(model, module_name, new_module)) without losing the weights in the transition?