Module.zero_grad() with requires_grad=False for some Parameter?

It seems that Module.zero_grad() does not like parameters with no grad (source below crashes), but what is the proper way to have model parameters which should not be touched by the backprop, but should benefit from Module’s comfort (cuda() etc.)?

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        self.s = Parameter(torch.rand(1, dim), requires_grad = False)
        self.t = Parameter(torch.rand(1, dim))

blah = Blah(10)

blah.zero_grad()

Maybe you want to register a buffer instead?
Parameters are meant to be optimised. If you don’t want a parameter, simply use a Variable instead, which is roughly equivalent to your grad-less Parameter.

This should do the trick.

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.autograd import Variable
from torch.nn import Module

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        self.s = Variable(torch.rand(1, dim))
        self.t = Parameter(torch.rand(1, dim))

blah = Blah(10)

blah.zero_grad()

Using a Variable was my first choice, but then Module.cuda() does not propagate to it.

With the use of Variable you suggest, blah.cuda() will convert blah.t.data to torch.cuda.FloatTensor as expected, but will leave blah.s.data unchanged.

Wouldn’t it be more consistent that Module.zero_grad() deal with requires_grad=False?

Edit: That would be in modules.py

def zero_grad(self):
    """Sets gradients of all model parameters to zero."""
    for p in self.parameters():
        if hasattr(p, 'grad'): p.grad.data.zero_()

you can overlod the cuda function and call default cuda method inside it.

def cuda(self):
    super(Blah, self).cuda()
    self.s.cuda()

But ain’t there other functionalities that I will have to fix also ? Persistence in particular ?

I really think you want to register a buffer…
When you call cuda(), you’ll have your buffer turned into a torch.cuda.FloatTensor (cuda() ref. and buffer’s apply() ref.).
Then, in your forward() method you can generate a Variable with the buffer content.

I must be missing something, I thought registe_buffer was to specify persistent fields, not the ones to be “cudaified”. In any case, I presume it should be self.register(‘s’, self.s) ? The following , with s a Variable

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
from torch.autograd import Variable

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        # self.s = Parameter(torch.rand(dim), requires_grad = False)
        self.s = Variable(torch.rand(dim))
        self.register_buffer('s', self.s)
        self.t = Parameter(torch.rand(dim))

blah = Blah(10)

blah.zero_grad()

print('s', type(blah.s.data))
print('t', type(blah.t.data))

if torch.cuda.is_available():
    blah.cuda()
    print('s', type(blah.s.data))
    print('t', type(blah.t.data))

does print

s <class 'torch.FloatTensor'>
t <class 'torch.FloatTensor'>
s <class 'torch.FloatTensor'>
t <class 'torch.cuda.FloatTensor'>

And making s a Tensor instead of a Variable does not help.

What about this.

import torch

from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
from torch.autograd import Variable

class Blah(Module):
    def __init__(self, dim):
        super(Blah, self).__init__()
        self.register_buffer('s', torch.rand(dim))
        self.t = Parameter(torch.rand(dim))

blah = Blah(10)

blah.zero_grad()

print('s', type(blah.s))
print('t', type(blah.t.data))

if torch.cuda.is_available():
    blah.cuda()
    print('s', type(blah.s))
    print('t', type(blah.t.data))

Which outputs the following.

/home/atcold/anaconda3/bin/python /home/atcold/Work/buffer.py
s <class 'torch.FloatTensor'>
t <class 'torch.FloatTensor'>
s <class 'torch.cuda.FloatTensor'>
t <class 'torch.cuda.FloatTensor'>

Process finished with exit code 0

Here is a ref. to the documentation of the register_buffer() method.

1 Like

Shouldn’t you use

self.s = Parameter(torch.rand(1, dim), volatile=True)

?

I’d also recommend using a buffer for that. However it is a bug, and it should be fixed anyway.

@csarofeen Parameter’s can’t be volatile and it’s likely not what’s wanted. Volatile will forcefully turn off the graph construction. You can read more in these notes.