What does register_buffer do?

I’m working through a tutorial on transformers (Tutorial 6: Transformers and Multi-Head Attention — UvA DL Notebooks v1.0 documentation) and I came across this block of code about positional encoding.

class PositionalEncoding(nn.Module):

def __init__(self, d_model, max_len=5000):
    """
    Inputs
        d_model - Hidden dimensionality of the input.
        max_len - Maximum length of a sequence to expect.
    """
    super().__init__()

    # Create matrix of [SeqLen, HiddenDim] representing the positional encoding for max_len inputs
    pe = torch.zeros(max_len, d_model)
    position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    pe = pe.unsqueeze(0)

    # register_buffer => Tensor which is not a parameter, but should be part of the modules state.
    # Used for tensors that need to be on the same device as the module.
    # persistent=False tells PyTorch to not add the buffer to the state dict (e.g. when we save the model)
    self.register_buffer('pe', pe, persistent=False)

def forward(self, x):
    x = x + self.pe[:, :x.size(1)]
    return x

I haven’t understood what the register_buffer does, would someone be able to explain it in an easier way? I get that it’s saved alongside model parameters but isn’t included in gradient calculations - so what’s the difference between this and just setting the requires_grad to be False?

I think this answer can help you. Also, another one

1 Like