Module.zero_grad() with requires_grad=False for some Parameter?

FrancoisFleuret · February 10, 2017, 9:11pm

It seems that Module.zero_grad() does not like parameters with no grad (source below crashes), but what is the proper way to have model parameters which should not be touched by the backprop, but should benefit from Module’s comfort (cuda() etc.)?

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        self.s = Parameter(torch.rand(1, dim), requires_grad = False)
        self.t = Parameter(torch.rand(1, dim))

blah = Blah(10)

blah.zero_grad()

Atcold · February 10, 2017, 9:17pm

Maybe you want to register a buffer instead?
Parameters are meant to be optimised. If you don’t want a parameter, simply use a Variable instead, which is roughly equivalent to your grad-less Parameter.

This should do the trick.

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.autograd import Variable
from torch.nn import Module

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        self.s = Variable(torch.rand(1, dim))
        self.t = Parameter(torch.rand(1, dim))

blah = Blah(10)

blah.zero_grad()

FrancoisFleuret · February 10, 2017, 9:41pm

Using a Variable was my first choice, but then Module.cuda() does not propagate to it.

With the use of Variable you suggest, blah.cuda() will convert blah.t.data to torch.cuda.FloatTensor as expected, but will leave blah.s.data unchanged.

Wouldn’t it be more consistent that Module.zero_grad() deal with requires_grad=False?

Edit: That would be in modules.py

def zero_grad(self):
    """Sets gradients of all model parameters to zero."""
    for p in self.parameters():
        if hasattr(p, 'grad'): p.grad.data.zero_()

ypxie · February 10, 2017, 9:46pm

you can overlod the cuda function and call default cuda method inside it.

def cuda(self):
    super(Blah, self).cuda()
    self.s.cuda()

FrancoisFleuret · February 10, 2017, 9:49pm

But ain’t there other functionalities that I will have to fix also ? Persistence in particular ?

Atcold · February 10, 2017, 10:31pm

I really think you want to register a buffer…
When you call cuda(), you’ll have your buffer turned into a torch.cuda.FloatTensor (cuda() ref. and buffer’s apply() ref.).
Then, in your forward() method you can generate a Variable with the buffer content.

FrancoisFleuret · February 10, 2017, 10:39pm

I must be missing something, I thought registe_buffer was to specify persistent fields, not the ones to be “cudaified”. In any case, I presume it should be self.register(‘s’, self.s) ? The following , with s a Variable

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
from torch.autograd import Variable

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        # self.s = Parameter(torch.rand(dim), requires_grad = False)
        self.s = Variable(torch.rand(dim))
        self.register_buffer('s', self.s)
        self.t = Parameter(torch.rand(dim))

blah = Blah(10)

blah.zero_grad()

print('s', type(blah.s.data))
print('t', type(blah.t.data))

if torch.cuda.is_available():
    blah.cuda()
    print('s', type(blah.s.data))
    print('t', type(blah.t.data))

does print

s <class 'torch.FloatTensor'>
t <class 'torch.FloatTensor'>
s <class 'torch.FloatTensor'>
t <class 'torch.cuda.FloatTensor'>

And making s a Tensor instead of a Variable does not help.

Atcold · February 10, 2017, 10:53pm

What about this.

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
from torch.autograd import Variable

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        self.register_buffer('s', torch.rand(dim))
        self.t = Parameter(torch.rand(dim))

blah = Blah(10)

blah.zero_grad()

print('s', type(blah.s))
print('t', type(blah.t.data))

if torch.cuda.is_available():
    blah.cuda()
    print('s', type(blah.s))
    print('t', type(blah.t.data))

Which outputs the following.

/home/atcold/anaconda3/bin/python /home/atcold/Work/buffer.py
s <class 'torch.FloatTensor'>
t <class 'torch.FloatTensor'>
s <class 'torch.cuda.FloatTensor'>
t <class 'torch.cuda.FloatTensor'>

Process finished with exit code 0

Here is a ref. to the documentation of the register_buffer() method.

csarofeen · February 11, 2017, 11:34am

Shouldn’t you use

self.s = Parameter(torch.rand(1, dim), volatile=True)

?

apaszke · February 11, 2017, 11:38am

I’d also recommend using a buffer for that. However it is a bug, and it should be fixed anyway.

@csarofeen Parameter’s can’t be volatile and it’s likely not what’s wanted. Volatile will forcefully turn off the graph construction. You can read more in these notes.