A different usage of torch.nn.Linear()

floatingpale · May 28, 2024, 4:53am

I initialize a module M=nn.Linear(a,b,bias=True).
But I don’t use it as a fully connected layer to accept an input and project it into the output.
I view the module M as a standalone entity with parameters M.weight and M.bias, which is like a “weight factory”.
Then I use M.weight to do operations, which is just like I instead initialize with nn.Parameter().
What is the difference between nn.Linear() and nn.Parameter() where we both treat them as weight factories.

ptrblck · May 28, 2024, 4:38pm

There is no difference since internally nn.Linear will also use its parameters in the same way by calling into the functional API via: F.linear(input, self.weight, self.bias) as seen here. Creating the module would allow you to directly call the forward pass but won’t change any other behavior.

floatingpale · June 3, 2024, 8:27am

Thank you very much! And I want to ask further that if I only use the weight in nn.Linear() but ignore the bias(set bias=True), how the update will be influenced? Since in my experiment, the performance of initialization with nn.Linear() and nn.Parameter() are different. Thank you again!

ptrblck · June 3, 2024, 1:37pm

Only the weight parameter will receive a gradient in its .grad attribute and will thus be updated. The bias.grad will be None (assuming the gradient was never computed before) and this parameter will thus not be updated (unless the optimizer has updated this parameter before and is using running internal stats).

floatingpale · June 4, 2024, 5:16am

Thank you sir! It really helps me a lot!