Calling .eval() on a torch.jit.ScriptModule leaves paramers.requires_grad True

alekseynp · June 24, 2020, 10:29pm

I was having trouble with memory usage of my traced model in C++, and I discovered that .eval() doesn’t change the requires_grad for the parameters in my ScriptModule. Is this intended behaviour? I can say that as a user this was not expected behaviour. As a user I would like it to work as I expect, to warn, or to raise. Given that I can manually set the requires_grad behaviour, it seems like my expected behaviour is possible?

I think the underlying cause is that my_script_module.layer is a RecursiveScriptModule and has no .children().

PyTorch 1.5

import torch


class MyScriptModule(torch.jit.ScriptModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(1, 1, bias=False)


my_script_module = MyScriptModule()

# [True]
print([p.requires_grad for p in my_script_module.parameters()])

my_script_module.eval()

# [True] :(
print([p.requires_grad for p in my_script_module.parameters()])

for p in my_script_module.parameters():
    p.requires_grad = False

# [False] :)
print([p.requires_grad for p in my_script_module.parameters()])

I know this isn’t a totally normal thing to be doing. For what it’s worth, I am subclassing ScrtipModule in this way so that I can do the following. Maybe I should do something differently?

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.inner = MyScriptModule()

    def forward(self, x):
        # stuff
        x = self.inner(x)
        # more stuff
        return x


class MyScriptModule(torch.jit.ScriptModule):
    """ Psuedo-code """
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(1, 1, bias=False)

    @torch.jit.script_method
    def forward(self, x):
        out = torch.zeros_like(x)
        for i in range(x.size()[0]):
            out = self.layer(x)
        return out


my_module = MyModule()
my_module.eval()

torch.jit.trace(my_module, sample)

ptrblck · June 25, 2020, 6:35am

Yes, this is expected and also won’t change the requires_grad attributes of an “eager” model.
model.eval() and model.train() change the internal self.training flag of all modules recursively starting from the parent module. By doing so, the behavior of some layers will be changed.
E.g. dropout will be disabled and batchnorm layers will use their running statistics to normalize the incoming data instead of calculating the batch statistics.

If you want to freeze the parameters, you would have to set their .requires_grad attribute to False.

You could of course use both in combination, e.g. freeze all parameters, but leave the dropout layers enabled, or let all parameters train, but use the running stats of batchnorm layers.