Are torch.nn.Functional layers learnable?

I generally define my networks in the __init__ function using torch.nn layers. However, if I want me convolution to be dependent on the size of the input, it might make more sense to use the torch.nn.Functional version. Are those parameters learnable too? If so, how does that effect the state graph, particularly when loading and saving models?

1 Like

When using torch.nn.Linear for example, the nn.Linear class looks after initialising and using the parameters that it requires.
When using the torch.nn.Functional.linear variant, it is up to you to provide the parameters on each forward pass.

Basically instead of

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = nn.Linear(in_features, out_features)

    def forward(self, input):
        return self.linear(input)

You would do this

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.weight = nn.Parameter(torch.randn(out_features, in_features))
        self.bias = nn.Parameter(torch.randn(out_features))

    def forward(self, input):
        return F.linear(input, weight, bias)

In the first case Model knows that it has an nn.Linear submodule, in the second case Model knows that it has two parameter tensors.

So in the first case Model.parameters() will list the weight and bias parameters of the nn.Linear submodule, in the second case Model.parameters() will list the weight and bias parameters defined in init.

Training, saving and loading can all be done in exactly the same way in both cases.

8 Likes
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

    def forward(self, input):
        return F.linear(input, torch.randn(out_features, in_features))

In this condition, Model.parameter has no parameter, since

m  = Model ()
len(list(m.parameters())) # returns 0

So, in this condition, is the parameter tensor the size of [in_features, out_features] in linear function learnable?

In BackPropagation , can i use SGD to update parameters in linear funtion?

No man. In the way you do it here a random tensor is indeed initialized, but it’s not part of the learnable set of parameters for your model because of the way you initialized it. See Are torch.nn.Functional layers learnable? for clarification