How to exclude Embedding layer from Model.parameters()?

ShawnGuo · March 23, 2017, 2:55am

I use an embedding layer to project one-hot indices to continuous space. However, during the training, I don’t want to update the weight of it. How could I do that?

smth · March 23, 2017, 3:50am

you can set the weight of the embedding layer to not require grad.

m = nn.Embedding(...)
m.weight.requires_grad=False

ShawnGuo · March 23, 2017, 4:35am

Oh, I see. Thank you very much.

ShawnGuo · March 23, 2017, 4:50am

Oh, sorry. After setting Embedding.weit.requires_grad = False, an error was raised.

ValueError: optimizing a parameter that doesn't require gradients

The optimizer is used in the following way:

self.optimizer = optim.Adadelta(self.model.parameters(), lr=args.learning_rate)

And, model is defined as follow:

class DecomposableModel(nn.Module):
    def __init__(self, word_embedding, config):
        super(DecomposableModel, self).__init__()
        self.name = 'DecomposableModel'

        self.drop_p = config['drop_p']

        self.word_dim = word_embedding.embeddings.size(1)
        self.embedding = nn.Embedding(word_embedding.embeddings.size(0), self.word_dim)
        self.embedding.weight = nn.Parameter(word_embedding.embeddings)
        self.embedding.weight.requires_grad = False
        # self.embedding_normalize()

        self.F = nn.Linear(self.word_dim, config['F_dim'])
        self.G = nn.Linear(2 * self.word_dim, config['G_dim'])
        self.H = nn.Linear(2 * config['G_dim'], config['relation_num'])

        self.cuda_flag = config['cuda_flag']

    def forward(self, p_ids, h_ids):
        ......

albanD · March 23, 2017, 10:48am

Hi,

Please see this post Freeze the learnable parameters of resnet and attach it to a new network that solves exactly your problem.