Function declared outside class in Pytorch

I have defined a function outside a class in Pytorch but the function has been called inside the constructor and forward() of the class something like:


import torch
import torch.nn as nn

def func1(a):
    ## Body of the func
    return 

def func2(b,c):
   ## Body of the func
    return

class Something(nn.Module):
    def __init__(self,a):
        super().__init__()
        self.something = func1(a)

    def forward(self,a,b):
        x = func2(a,b)

Is the above code valid? I mean will the weights be updated during training?

Your snipped is valid with respect to the forward pass, BUT your class Something won’t register the parameters of func2 (neither func1). So you will have to manually save and load their parameters, and you’ll have to add those parameters to the optimizer explicitly (model.parameters() won’t return them).

Couldn’t you, instead of using functions, use modules ? Witch would not have those drawback, ex:

import torch
import torch.nn as nn

def func1(a):
    ## Body of the func
    return 

class Module2(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self,a,b):
    ## Body of the module
    return 

class Something(nn.Module):
    def __init__(self,a):
        super().__init__()
        self.something = func1(a)
        self.module2 = Module2()

    def forward(self,a,b):
        x = self.module2(a,b)

Note that I did not change func1 as I’m not sure which behavior you want regarding gradients here.

Okay, I was also thinking the same as I can wrap those functions into a class and then instantiate those classes into the primary class. Something like,

import torch
import torch.nn as nn

class Module1(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self,a,b):
    ## Body of the module
        return 

class Module2(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self,a,b):
    ## Body of the module
        return 

class Something(nn.Module):
    def __init__(self,a):
        super().__init__()
        self.something = Module1(a)
        self.module2 = Module2()

    def forward(self,a,b):
        x = self.module2(a,b)

One more thing do I need to inherit the secondary classes into the primary class Something

No, it’s not necessary, only inheriting from nn.Module is enough.

Little remark, self.something = Module1(a) won’t work, I think you want something like:

module1 = Module1()
self.something = module1(a)
1 Like

Thanks, I got my answer

You’re welcome :slight_smile:

Just a last thing, the first function can stay a function, as you store only its result in something you won’t get any benefit from using Module1 instead of func1.

Ok, tell me one last thing since I’m new to Pytorch so which variable’s gradients are calculated during training, the ones which are inside a class constructor or the forward function? Like there are cases where there are classes without constructor and just some operations on tensors in the forward().

The module is the class made to hold your NN model, including its parameters. But only the parameters used during the forward pass will accumulate some gradients.

Usually you’ll need to define the __init__ method where you’ll register the parameters, buffers and define the sub-modules, which can be accessed during the forward pass through attributes.

After a forward call producing an output tensor, you’ll compute a loss from the output, and finally perform the backward pass. Gradients are updated through the backward pass with an autograd mechanism based on the computational graph associated to the loss (see this tutorial for more information on this mechanism). Every tensors that needs gradients (as the model parameters) and that were used during the forward pass will accumulate some gradients. When you call the step method of your optimizer, the parameters tracked by the optimizer are updated based the accumulated gradients.

Here is a little example:

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        # Tensor not registered, you should avoid doing that.
        self.tensor = torch.randn(1)
        # Tensor registered (aka Buffer)
        self.register_buffer('registered_buffer', torch.randn(1))
        # Parameters registered (parameters are automatically registered)
        self.parameter = nn.Parameter(torch.randn(1))
        # Sub-module (convolution layer)
        self.conv = torch.nn.Conv1d(1,1,1)

        # Parameters registered but unused in forward
        self.parameter_unused = nn.Parameter(torch.randn(1))

    def forward(self, x):
        # registered tensor used in forward
        x = x + self.registered_buffer
        # registered parameter used in forward
        x = x + self.parameter
        # sub-module used in forward
        x = self.conv(x)
        return x

Now, if you do a training iteration as follow:

#### INITIALIZATION
# Model init:
model = MyModule()

# Create optimizer
optimizer = optim.Adam(model.parameters(), lr=0.0001)

#### TRAINING ITERATION

# Set zero grad (to perform at the begining of every training iteration)
optimizer.zero_grad()

# Get input data and target
dummy_input = torch.randn(1,1,1)
dummy_target = torch.randn(1,1,1)

# get model prediction
model_out = model(dummy_input)

# loss computation
loss = (model_out - dummy_target)**2
loss = loss.sum()

# Perform backward pass
loss.backward()

# Update parameters
optimizer.step()

Then only the tensors (and tensors of the sub-modules) that were used during the forward pass and that needs gradients get some gradient during the backward and are updated during the step call. So, in this example:

  • model.tensor won’t get any gradient obviously
  • model.registered_buffer will not get any gradient, because a buffer doesn’t requires gradients (by default)
  • model.parameter will get gradient and be updated, as parameters need gradients (by default) and this one is used during the forward pass
  • model.parameter_unused won’t get any gradient because it was not used during the forward pass.
  • model.conv will get gradients and be updated, because this sub-module is used, the parameters it updates are model.conv.weight and model.conv.bias

Last thing, you see I commented that you should not use unregistered tensor as I did in this example with self.tensor = torch.randn(1) in the model constructor. Because if you use unregistered tensor you will get some trouble later, for instance when loading/saving your parameters, or when changing the dtype and device of your module, as only the registered buffer, parameters, and sub-modules will be tracked.

That was a long answer, hope you don’t get more confused ^^

Thanks, I get your point. :grinning: