Why does code like Fairseq override the default initialization of the nn.Embedding layer

alvations · October 16, 2018, 8:52am

From https://github.com/pytorch/fairseq/blob/master/fairseq/models/lstm.py#L448

def Embedding(num_embeddings, embedding_dim, padding_idx):
    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
    nn.init.uniform_(m.weight, -0.1, 0.1)
    nn.init.constant_(m.weight[padding_idx], 0)
    return m

What are the nn.init.* lines of the function doing?

On the surface it looks like the Embedding function in Fairseq is trying to override the default nn.Embedding, but Pythonically it doesn’t look like the m object is changed.

But why is there a need to first initialize with uniform distribution then constant?

Looking at the code, it looks like without the : https://github.com/pytorch/pytorch/blob/master/torch/nn/init.py#L50

def uniform_(tensor, a=0, b=1):
    r"""Fills the input Tensor with values drawn from the uniform
    distribution :math:`\mathcal{U}(a, b)`.
    Args:
        tensor: an n-dimensional `torch.Tensor`
        a: the lower bound of the uniform distribution
        b: the upper bound of the uniform distribution
    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.uniform_(w)
    """
    with torch.no_grad():
        return tensor.uniform_(a, b)

def constant_(tensor, val):
    r"""Fills the input Tensor with the value :math:`\text{val}`.
    Args:
        tensor: an n-dimensional `torch.Tensor`
        val: the value to fill the tensor with
    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.constant_(w, 0.3)
    """
    with torch.no_grad():
        return tensor.fill_(val)

alvations · October 16, 2018, 8:54am

It sounds crazy that I’m answering my own question but I’m just going to write these down so that it’s somehow documented here and I can come back to check when in doubt =)

It’s because the second nn.init is not re-initializing all the m.weight but only the 0th row.


>>> m = nn.Embedding(10, 128, padding_idx=0)
>>> nn.init.uniform_(m.weight, -0.1, 0.1)
>>> print(m.weight)
Parameter containing:
tensor([[ 0.0270, -0.0801, -0.0744,  ...,  0.0148, -0.0218,  0.0371],
        [-0.0279,  0.0053, -0.0801,  ..., -0.0808, -0.0775, -0.0291],
        [-0.0704, -0.0537, -0.0336,  ...,  0.0730,  0.0389,  0.0482],
        ...,
        [-0.0324, -0.0685, -0.0629,  ..., -0.0472, -0.0990, -0.0958],
        [ 0.0852,  0.0468,  0.0605,  ..., -0.0059,  0.0389, -0.0629],
        [-0.0258,  0.0022,  0.0793,  ...,  0.0473,  0.0635, -0.0056]],
       requires_grad=True)

>>> nn.init.constant_(m.weight[0], 0)
>>> print(m.weight)
Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0279,  0.0053, -0.0801,  ..., -0.0808, -0.0775, -0.0291],
        [-0.0704, -0.0537, -0.0336,  ...,  0.0730,  0.0389,  0.0482],
        ...,
        [-0.0324, -0.0685, -0.0629,  ..., -0.0472, -0.0990, -0.0958],
        [ 0.0852,  0.0468,  0.0605,  ..., -0.0059,  0.0389, -0.0629],
        [-0.0258,  0.0022,  0.0793,  ...,  0.0473,  0.0635, -0.0056]],
       requires_grad=True)

alvations · October 16, 2018, 8:58am

It’s sort of weird (non-pythonic) to call a function that overrides a class’ attribute values. Is it possible to just overwrite the weights inside the nn.Embeddings object?

alvations · October 17, 2018, 5:12am

IMHO, this is more Pythonic =)

import torch.nn as nn

class Embedding(nn.Embedding):
    def __init__(self, num_embeddings, embedding_dim, padding_idx):
        super().__init__(num_embeddings, embedding_dim, padding_idx=padding_idx)
        nn.init.uniform_(self.weight, -0.1, 0.1)
        nn.init.constant_(self.weight[padding_idx], 0)