Why does code like Fairseq override the default initialization of the nn.Embedding layer

def Embedding(num_embeddings, embedding_dim, padding_idx):
nn.init.uniform_(m.weight, -0.1, 0.1)
return m


What are the nn.init.* lines of the function doing?

On the surface it looks like the Embedding function in Fairseq is trying to override the default nn.Embedding, but Pythonically it doesn’t look like the m object is changed.

But why is there a need to first initialize with uniform distribution then constant?

Looking at the code, it looks like without the : https://github.com/pytorch/pytorch/blob/master/torch/nn/init.py#L50

def uniform_(tensor, a=0, b=1):
r"""Fills the input Tensor with values drawn from the uniform
distribution :math:\mathcal{U}(a, b).
Args:
tensor: an n-dimensional torch.Tensor
a: the lower bound of the uniform distribution
b: the upper bound of the uniform distribution
Examples:
>>> w = torch.empty(3, 5)
>>> nn.init.uniform_(w)
"""
return tensor.uniform_(a, b)

def constant_(tensor, val):
r"""Fills the input Tensor with the value :math:\text{val}.
Args:
tensor: an n-dimensional torch.Tensor
val: the value to fill the tensor with
Examples:
>>> w = torch.empty(3, 5)
>>> nn.init.constant_(w, 0.3)
"""
return tensor.fill_(val)


It sounds crazy that I’m answering my own question but I’m just going to write these down so that it’s somehow documented here and I can come back to check when in doubt =)

It’s because the second nn.init is not re-initializing all the m.weight but only the 0th row.


>>> m = nn.Embedding(10, 128, padding_idx=0)
>>> nn.init.uniform_(m.weight, -0.1, 0.1)
>>> print(m.weight)
Parameter containing:
tensor([[ 0.0270, -0.0801, -0.0744,  ...,  0.0148, -0.0218,  0.0371],
[-0.0279,  0.0053, -0.0801,  ..., -0.0808, -0.0775, -0.0291],
[-0.0704, -0.0537, -0.0336,  ...,  0.0730,  0.0389,  0.0482],
...,
[-0.0324, -0.0685, -0.0629,  ..., -0.0472, -0.0990, -0.0958],
[ 0.0852,  0.0468,  0.0605,  ..., -0.0059,  0.0389, -0.0629],
[-0.0258,  0.0022,  0.0793,  ...,  0.0473,  0.0635, -0.0056]],

>>> nn.init.constant_(m.weight[0], 0)
>>> print(m.weight)
Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
[-0.0279,  0.0053, -0.0801,  ..., -0.0808, -0.0775, -0.0291],
[-0.0704, -0.0537, -0.0336,  ...,  0.0730,  0.0389,  0.0482],
...,
[-0.0324, -0.0685, -0.0629,  ..., -0.0472, -0.0990, -0.0958],
[ 0.0852,  0.0468,  0.0605,  ..., -0.0059,  0.0389, -0.0629],
[-0.0258,  0.0022,  0.0793,  ...,  0.0473,  0.0635, -0.0056]],


It’s sort of weird (non-pythonic) to call a function that overrides a class’ attribute values. Is it possible to just overwrite the weights inside the nn.Embeddings object?

IMHO, this is more Pythonic =)

import torch.nn as nn

class Embedding(nn.Embedding):