How to implement a custom init scheme?

romain · February 25, 2021, 3:35pm

Hello, I’d like to know how to properly implement a custom initialization scheme. For the moment, I’m doing it as below, but from PRs I gather that using the .data property isn’t recommended.

class MyModule(nn.Module):
    def __init__(self, hidden_dim: int):
        super(MyModule, self).__init__()
        self.hidden_dim = hidden_dim
        self.my_param = nn.Parameter(torch.zeros(hidden_dim))
        self.reset_parameters()

    def reset_parameters(self) -> None:
        u = torch.rand(self.hidden_dim) * (1 - 2 / self.hidden_dim) + 1 / self.hidden_dim
        self.my_param.data = -(1 / u - 1).log()

As you can see the scheme is a bit complex (it generates a uniform distribution over the inverse of the sigmoid function) so there are no built-in that can directy be used to modify the tensor in place, e.g. via this for the uniform function:

with torch.no_grad():
    my_tensor.uniform_(a, b)

Could you tell me how to do it ?

ptrblck · February 26, 2021, 9:42am

You could wrap the code into the no_grad() guard and use the .copy_ operation to fill the parameter. If you are initializing the plain tensor before wrapping it into an nn.Parameter, you could also skip the no_grad() guard.

romain · February 26, 2021, 10:17am

I see, so since I want the reset_parameters to be callable even if the module already exists I should simply do:

with torch.no_grad():
    self.my_param.copy_(-(1 / u - 1).log())

Great thanks a lot