Dynamic parameter declaration in forward function

In the current pytorch example, all the parameters have to be pre-defined in the class init function or use existing nn.Module, like nn.Linear. However, this requires us to compute the parameters size correctly for the whole graph, which can be quite tedious and error prone when the model become complex.

Declare the parameters in the forward function seems to be a solution because the intermediate results are already known at that point, but the thing is this might make the parameters be declared every time we run the forward function.

Tensorflow solve this problem by proving a tf.get_variable function.
can we also add something like Module.get_parameter, which create a new parameter or return an existing one if it already exists in the Module.parameters?

def conv_relu(input):
    # Create variable named "weights", its shape can be infered from input.
    kernel_shape = (64, input.size()[0],3,3)
    weights = nn.Module.get_parameter("weights", kernel_shape,
    conv = F.conv2d(input, weights)
    return F.relu(conv)



You can create parameters in the forward function too. Just guard them with an if to prevent reassigning at every iteration:

class MyModule(nn.Module):
    def __init__(self):
        # you need to register the parameter names earlier
        self.register_parameter('weight', None)

    def forward(self, input):
        if self.weight is None:
            self.weight = nn.Parameter(torch.randn(input.size()))
        return self.weight @ input

In this way, model.cuda() which is usually called before forward() might not work properly. if one more if_else is added to check use_cuda, the code can be unnecessarily long.


Good point. This would be better then:

class MyModule(nn.Module):
    def __init__(self):
        # you need to register the parameter names earlier
        self.register_parameter('weight', None)

    def reset_parameters(self, input):
        self.weight = nn.Parameter(input.new(input.size()).normal_(0, 1))    

    def forward(self, input):
        if self.weight is None:
        return self.weight @ input

input.new will create a new tensor of the same type as input, and it will be placed on the same GPU as input.


Thanks for your reply, but I didn’t get why would this help? It seems to me register_parameter just register a None parameter to the parameters list. How could model.cuda() affect a None parameter?

How about the following?

def cuda(self, device_id=None):
        """Moves all model parameters and buffers to the GPU.

            device_id (int, optional): if specified, all parameters will be
                copied to that device
        self._cuda = Ture
        self._device_id = device_id
        return self._apply(lambda t: t.cuda(device_id))

def cpu(self, device_id=None):
        """Moves all model parameters and buffers to the CPU."""
        self._cuda = False
        return self._apply(lambda t: t.cpu())

def get_parameter(self, name, shape):
       if not name in self._parameters.keys():
            self._parameters[name] = nn.Parameter(torch.randn(shape))
       var = self._parameters[name]      
       fn = lambda t: t.cuda(self._device_id) if self._cuda else lambda t: t.cpu()
       var.data = fn(var.data)
       if var.grad is not None: 
           var.grad.data = fn(var.grad.data)
       return var

model.cuda() won’t affect it, unless it has be reassigned. However, if you call model.cuda() and then forward a CUDA input, input.new will allocate a CUDA tensor, so the types will always match. I find that solution simpler and more robust than what you proposed. Doesn’t it work for you?

Also, don’t fiddle with internal fields. It’s not a good idea. They’re subject to change without notice.

1 Like

I just realized input.new(shape) shape is not necessarily equal to input.size(), that makes sense.

I wonder how we are supposed to use registered variables in optimizers (e.g. SGD). .parameters() call does not return None parameters, so if you want to register parameter it in advance, you can create optimizer only after first forward pass?

@ypxie: Note you have: self.cuda = Ture instead of True

you can do a forward pass in the last line of init funciton.

Hello,I am not very good at python now.I am confused at it.What is the meaning of @ input?

I also have another question.If I use register_parameter to register a parameter.This parameter will update automatically when the training step is run.
If I only want to have a parameter that is just dynamic and don’t need to be updated.How can i implement it.For example:

self.title_conv = nn.Sequential(
# the kernel_size is changed because the input's length of conv layer is Changeable.
# Therefore, kernel_size was computed in forward function.
# Then i want pass it to maxpool layer.How can I implement it?

I also have the same problem, since I would like to adapt different kernel sizes according to different input data sizes. Did you solve your problem?

Here’s another way that seems to work. Like your example with register_parameter, a forward pass is required before initializing the optimizer. Do you see any clear advantages or disadvantages?

class DynamicLinear(nn.Module):
    def __init__(self, output_dim):
        super(DynamicLinear, self).__init__()
        self.output_dim = output_dim

    def forward(self, inputs):
        if not hasattr(self, '_linear'):
            input_dim = inputs.shape[-1]
            self._linear = nn.Linear(input_dim, self.output_dim)
        return self._linear(inputs)

won’t work properly . same issue about model.cuda()

How to set the bias?

I mean F.conv2d using default bias or not? how to make nn.Conv2d dyamic weights in forward and also same effect as calling F.conv2d without bais?