Best way to define a scalar using nn.Parameter in Pytorch

mohit117 · September 23, 2020, 6:02am

In my CNN at some stage I want to multiply a feature map with some scalar which should be learnt by the network. The scalar has to be initialised to 5. So I think of the following solution,

def __init__(self):
    super(..., self).__init__()
    ...
    ...
    alpha = nn.Parameter(torch.ones(1)*5)
    ...

def forward(self, x):
    ...
    x = x * alpha
    return x

This should work because of the BROADCASTING SEMANTICS of PyTorch. But I wonder why or why not the following be used,

def __init__(self):
    super(..., self).__init__()
    ...
    ...
    alpha = nn.Parameter(torch.tensor(5.))
    ...

def forward(self, x):
    ...
    x = x * alpha
    return x

Surely this should work but is there any pitfall for the latter approach when compared to the first approach which uses torch.ones(1)*5. Generally I do not see people using the latter approach and use the first approach only.

Thankyou
Mohit

tom · September 23, 2020, 6:36am

I agree with your that torch.tensor(5.) would seem to be even better here for what you’re trying to achieve.

It seems most concise.
The tensor will be truly a scalar (0-dim) tensor (you’d need to do torch.ones(())) for that.
There isn’t any performance aspect. (e.g. when you’re in the middle of a computation, it’s always good to create the tensor on the right device rather than starting on CPU moving it, but this isn’t applicable here.)
As an aside, torch.full would probably be more natural than torch.ones.

One likely reason this isn’t used as much is that in the very olden days (before 0.4?), PyTorch didn’t have 0-dim tensor support and these things are very hard to get out of people’s heads (like Variable or using torch.Tensor or using .data left and right). It seems people find old code with that an then adapt it so it runs and does what they want (which is, of course, OK and I’m happy when people can make new stuff from old code). I’m sure I’m guilty of not moving all my code to best practices.

Best regards

Thomas

mohit117 · September 23, 2020, 11:13am

Thankyou for the explanation.

Just one more thing. I checked the documentation of torch.full and torch.ones. By default their requires_grad=False. I hope when I wrap them inside nn.Parameter it would be automatically switched to True?

Thankyou

tom · September 23, 2020, 2:58pm

Yes, nn.Parameter sets the requires_grad to true unless you explicitly ask it not to.