Why don't need to set "requires_grad = True' for learnable parameters when using torch.nn.Sequential() or torhc.nn.Modules subclasses?

Michael_Hsu · October 30, 2018, 12:54pm

Hello, I’m new to PyTorch! And I have seen the power of autograd.I know for Tensors in PyTorch, the deault requires_grad values is false.So if you wanna use autograd ,you have got to explicitly specify requires_grad to be True. But when I use the model built in torch.nn.Sequential or torch.nn.Modules subclasses, I find that without specifying , the optimizer can also work fine! Anyone can help me , thanks！
Just like this:

# _*_ coding: utf-8 _*_
import random
import torch


# An example to show the dynamic features of PyTorch
class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        '''
        For the forward pass of the model, we randomly choose either 0,1,2 or 3
        and reuse the middle_linear Module that many times to compute hidden
        layer representation
        :param x:
        :return:
        '''
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


if __name__ == '__main__':
    N, D_in, H, D_out = 64, 1000, 100, 10
    x = torch.randn(N, D_in)
    y = torch.randn(N, D_out)

    model = DynamicNet(D_in, H, D_out)

    loss_fn = torch.nn.MSELoss(reduction='sum')
    # todo: why it works fine without setting requires_grad = True
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
    EPOCH = 500
    for t in range(EPOCH):
        y_pred = model(x)
        loss = loss_fn(y_pred, y)
        print(t, loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

albanD · October 30, 2018, 1:27pm

Hi,

This is because all the parameters in the nn world are defined as nn.Parameters(). These are basically tensors that requires_grad and that work nice with nn tools. For example, for the nn.Linear layer, they are defined here.

Michael_Hsu · October 30, 2018, 2:10pm

I see! Thanks for your answer! It helps me a lot, and it seems that I’d better read some source code!