In the current pytorch example, all the parameters have to be pre-defined in the class init function or use existing nn.Module, like nn.Linear. However, this requires us to compute the parameters size correctly for the whole graph, which can be quite tedious and error prone when the model become complex.
Declare the parameters in the forward function seems to be a solution because the intermediate results are already known at that point, but the thing is this might make the parameters be declared every time we run the forward function.
Tensorflow solve this problem by proving a tf.get_variable function.
can we also add something like Module.get_parameter, which create a new parameter or return an existing one if it already exists in the Module.parameters?
def conv_relu(input):
# Create variable named "weights", its shape can be infered from input.
kernel_shape = (64, input.size()[0],3,3)
weights = nn.Module.get_parameter("weights", kernel_shape,
initializer='uniform')
conv = F.conv2d(input, weights)
return F.relu(conv)
You can create parameters in the forward function too. Just guard them with an if to prevent reassigning at every iteration:
class MyModule(nn.Module):
def __init__(self):
# you need to register the parameter names earlier
self.register_parameter('weight', None)
def forward(self, input):
if self.weight is None:
self.weight = nn.Parameter(torch.randn(input.size()))
return self.weight @ input
In this way, model.cuda() which is usually called before forward() might not work properly. if one more if_else is added to check use_cuda, the code can be unnecessarily long.
class MyModule(nn.Module):
def __init__(self):
# you need to register the parameter names earlier
self.register_parameter('weight', None)
def reset_parameters(self, input):
self.weight = nn.Parameter(input.new(input.size()).normal_(0, 1))
def forward(self, input):
if self.weight is None:
self.reset_parameters(input)
return self.weight @ input
input.new will create a new tensor of the same type as input, and it will be placed on the same GPU as input.
Thanks for your reply, but I didn’t get why would this help? It seems to me register_parameter just register a None parameter to the parameters list. How could model.cuda() affect a None parameter?
def cuda(self, device_id=None):
"""Moves all model parameters and buffers to the GPU.
Arguments:
device_id (int, optional): if specified, all parameters will be
copied to that device
"""
self._cuda = Ture
self._device_id = device_id
return self._apply(lambda t: t.cuda(device_id))
def cpu(self, device_id=None):
"""Moves all model parameters and buffers to the CPU."""
self._cuda = False
return self._apply(lambda t: t.cpu())
def get_parameter(self, name, shape):
if not name in self._parameters.keys():
self._parameters[name] = nn.Parameter(torch.randn(shape))
var = self._parameters[name]
fn = lambda t: t.cuda(self._device_id) if self._cuda else lambda t: t.cpu()
var.data = fn(var.data)
if var.grad is not None:
var.grad.data = fn(var.grad.data)
return var
model.cuda() won’t affect it, unless it has be reassigned. However, if you call model.cuda() and then forward a CUDA input, input.new will allocate a CUDA tensor, so the types will always match. I find that solution simpler and more robust than what you proposed. Doesn’t it work for you?
I wonder how we are supposed to use registered variables in optimizers (e.g. SGD). .parameters() call does not return None parameters, so if you want to register parameter it in advance, you can create optimizer only after first forward pass?
I also have another question.If I use register_parameter to register a parameter.This parameter will update automatically when the training step is run.
If I only want to have a parameter that is just dynamic and don’t need to be updated.How can i implement it.For example:
self.title_conv = nn.Sequential(
nn.Conv1d(),
nn.ReLU(),
# the kernel_size is changed because the input's length of conv layer is Changeable.
# Therefore, kernel_size was computed in forward function.
# Then i want pass it to maxpool layer.How can I implement it?
nn.MaxPool1d(kernel_size=)
)
Here’s another way that seems to work. Like your example with register_parameter, a forward pass is required before initializing the optimizer. Do you see any clear advantages or disadvantages?
class DynamicLinear(nn.Module):
def __init__(self, output_dim):
super(DynamicLinear, self).__init__()
self.output_dim = output_dim
def forward(self, inputs):
if not hasattr(self, '_linear'):
input_dim = inputs.shape[-1]
self._linear = nn.Linear(input_dim, self.output_dim)
return self._linear(inputs)
Apologies for reviving this, I’m looking to do precisely this but it seems this no longer works. It seems from the current documentation for register_parameter() that for its second argument, " If None , the parameter is not included in the module’s state_dict.".
I think the above therefore results in the parameter not being found in the .parameters() method when an object is instantiated from this class.
Instead of using None as the second argument to register_parameter() I could use nn.Parameter(), but then then not sure how to check if a parameter is uninitialised, so that I can write the correct if statement in the forward() method to avoid resetting the parameter at each iteration.
What is the up-to-date way to do this?
(Apologies if I am mistaken and have the wrong end of the stick here).