Need help! The embedding layer is not updated!

piapple · July 15, 2021, 2:28am

I am trying to implement the matrix multiplication using embedding layers. Basically, in the forward function I would like to get

a user factor with shape [1, N],
a time matrix with shape [N, N], and
a item factor with shape [N, 1]

Then, I would like (user_factor * time_matrix * item factor) and output a scalar value.
However, when I check the time_factors over time, it is not updated at all. I am not sure if the reshaping effect the autograd. I have no idea which step is wrong. The user_factors.weight are updated over time.

The following is my implementation. Thank you for your helps.

class MF(torch.nn.Module):
    def __init__(self, n_users, n_attempts, n_items, n_factors=2, seed=1024):
        super().__init__()
        torch.random.manual_seed(seed)
        self.n_users = n_users
        self.n_items = n_items
        self.n_factors = n_factors
        self.user_factors = nn.Embedding(n_users, n_factors)
        self.time_factors = nn.Embedding(n_attempts, n_factors * n_factors)
        self.item_factors = nn.Embedding(n_items, n_factors)
        self.stress_item_factor = nn.Embedding(1, n_factors)

        self.user_biases = nn.Embedding(n_users, 1)
        self.time_biases = nn.Embedding(n_attempts, 1)
        self.item_biases = nn.Embedding(n_items, 1)

    def forward(self, user, attempt, item):
        u_factor = self.user_factors(user)
        t_factor = self.time_factors(attempt)
        t_matrix = t_factor.reshape(-1, self.n_factors, self.n_factors)
        stress = self.user_biases(user) + self.time_biases(attempt)
        tmp = torch.matmul(u_factor, t_matrix).squeeze(dim=1)
        stress += torch.matmul(tmp, self.stress_item_factor(torch.tensor(0)))
        return stress.squeeze(dim=-1)

And I use the following way to train it:

        for idx, (u, t, i, v) in enumerate(self.train_data):
            user = torch.Tensor([u]).long()
            attempt = torch.Tensor([t]).long()
            item = torch.Tensor([i]).long()
            value = torch.Tensor([v])
            self.optimizer.zero_grad()
            pred = self.model(user, attempt, item)
            loss = self.mse_loss(pred, value)
            loss.backward()
            self.optimizer.step()

When I try to print out the values of self.model.time_factors.weight, the values do not change over time. However, then I try to print the values of self.model.time_factors.weight.grad, the values change.

It is very strange. Hope anyone could help me on this. Thank you very much.

tom · July 15, 2021, 6:23am

You don’t show how you instantiate the optimizer and which, maybe it’s not in there for some reason or maybe the gradients are too small? SGD runs into trouble when the things you are optimizing work at different scales or there is a scale mismatch between the optimized quantity and the gradients (Adam, LARS/LAMB less so).

Best regards

Thomas

piapple · July 15, 2021, 3:21pm

Thank you, Thomas. I realize I made a silly mistake on the optimizer that does not set the learning rate for time_factors:

    self.optimizer = torch.optim.SGD([
        {"params": self.model.user_factors.weight},
        {"params": self.model.item_factors.weight},
        {"params": self.model.user_biases.weight, "lr": config.bias_learning_rate},
        {"params": self.model.time_biases.weight, "lr": config.bias_learning_rate},
        {"params": self.model.item_biases.weight, "lr": config.bias_learning_rate}
    ], lr=config.learning_rate, weight_decay=config.weight_decay)

Thank you for your help.