It seems that Module.zero_grad() does not like parameters with no grad (source below crashes), but what is the proper way to have model parameters which should not be touched by the backprop, but should benefit from Module’s comfort (cuda() etc.)?
import torch
from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
class Blah(Module):
def __init__(self, dim):
super(Blah, self).__init__()
self.s = Parameter(torch.rand(1, dim), requires_grad = False)
self.t = Parameter(torch.rand(1, dim))
blah = Blah(10)
blah.zero_grad()
Maybe you want to register a buffer instead? Parameters are meant to be optimised. If you don’t want a parameter, simply use a Variable instead, which is roughly equivalent to your grad-less Parameter.
This should do the trick.
import torch
from torch import Tensor
from torch.nn.parameter import Parameter
from torch.autograd import Variable
from torch.nn import Module
class Blah(Module):
def __init__(self, dim):
super(Blah, self).__init__()
self.s = Variable(torch.rand(1, dim))
self.t = Parameter(torch.rand(1, dim))
blah = Blah(10)
blah.zero_grad()
Using a Variable was my first choice, but then Module.cuda() does not propagate to it.
With the use of Variable you suggest, blah.cuda() will convert blah.t.data to torch.cuda.FloatTensor as expected, but will leave blah.s.data unchanged.
Wouldn’t it be more consistent that Module.zero_grad() deal with requires_grad=False?
I really think you want to register a buffer…
When you call cuda(), you’ll have your buffer turned into a torch.cuda.FloatTensor (cuda() ref. and buffer’s apply() ref.).
Then, in your forward() method you can generate a Variable with the buffer content.
I must be missing something, I thought registe_buffer was to specify persistent fields, not the ones to be “cudaified”. In any case, I presume it should be self.register(‘s’, self.s) ? The following , with s a Variable
import torch
from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
from torch.autograd import Variable
class Blah(Module):
def __init__(self, dim):
super(Blah, self).__init__()
# self.s = Parameter(torch.rand(dim), requires_grad = False)
self.s = Variable(torch.rand(dim))
self.register_buffer('s', self.s)
self.t = Parameter(torch.rand(dim))
blah = Blah(10)
blah.zero_grad()
print('s', type(blah.s.data))
print('t', type(blah.t.data))
if torch.cuda.is_available():
blah.cuda()
print('s', type(blah.s.data))
print('t', type(blah.t.data))
does print
s <class 'torch.FloatTensor'>
t <class 'torch.FloatTensor'>
s <class 'torch.FloatTensor'>
t <class 'torch.cuda.FloatTensor'>
And making s a Tensor instead of a Variable does not help.
import torch
from torch import Tensor
from torch.nn.parameter import Parameter
from torch.nn import Module
from torch.autograd import Variable
class Blah(Module):
def __init__(self, dim):
super(Blah, self).__init__()
self.register_buffer('s', torch.rand(dim))
self.t = Parameter(torch.rand(dim))
blah = Blah(10)
blah.zero_grad()
print('s', type(blah.s))
print('t', type(blah.t.data))
if torch.cuda.is_available():
blah.cuda()
print('s', type(blah.s))
print('t', type(blah.t.data))
Which outputs the following.
/home/atcold/anaconda3/bin/python /home/atcold/Work/buffer.py
s <class 'torch.FloatTensor'>
t <class 'torch.FloatTensor'>
s <class 'torch.cuda.FloatTensor'>
t <class 'torch.cuda.FloatTensor'>
Process finished with exit code 0
Here is a ref. to the documentation of the register_buffer() method.
I’d also recommend using a buffer for that. However it is a bug, and it should be fixed anyway.
@csarofeen Parameter’s can’t be volatile and it’s likely not what’s wanted. Volatile will forcefully turn off the graph construction. You can read more in these notes.