Dynamic parameter declaration in forward function

ypxie · February 10, 2017, 3:58pm

In the current pytorch example, all the parameters have to be pre-defined in the class init function or use existing nn.Module, like nn.Linear. However, this requires us to compute the parameters size correctly for the whole graph, which can be quite tedious and error prone when the model become complex.

Declare the parameters in the forward function seems to be a solution because the intermediate results are already known at that point, but the thing is this might make the parameters be declared every time we run the forward function.

Tensorflow solve this problem by proving a tf.get_variable function.
can we also add something like Module.get_parameter, which create a new parameter or return an existing one if it already exists in the Module.parameters?

def conv_relu(input):
    # Create variable named "weights", its shape can be infered from input.
    kernel_shape = (64, input.size()[0],3,3)
    weights = nn.Module.get_parameter("weights", kernel_shape,
        initializer='uniform')
    conv = F.conv2d(input, weights)
    return F.relu(conv)

https://www.tensorflow.org/versions/r1.0/how_tos/variable_scope/

apaszke · February 10, 2017, 7:54pm

You can create parameters in the forward function too. Just guard them with an if to prevent reassigning at every iteration:

class MyModule(nn.Module):
    def __init__(self):
        # you need to register the parameter names earlier
        self.register_parameter('weight', None)

    def forward(self, input):
        if self.weight is None:
            self.weight = nn.Parameter(torch.randn(input.size()))
        return self.weight @ input

ypxie · February 10, 2017, 8:09pm

In this way, model.cuda() which is usually called before forward() might not work properly. if one more if_else is added to check use_cuda, the code can be unnecessarily long.

apaszke · February 11, 2017, 11:42am

Good point. This would be better then:

class MyModule(nn.Module):
    def __init__(self):
        # you need to register the parameter names earlier
        self.register_parameter('weight', None)

    def reset_parameters(self, input):
        self.weight = nn.Parameter(input.new(input.size()).normal_(0, 1))    

    def forward(self, input):
        if self.weight is None:
            self.reset_parameters(input)
        return self.weight @ input

input.new will create a new tensor of the same type as input, and it will be placed on the same GPU as input.

ypxie · February 11, 2017, 8:54pm

Thanks for your reply, but I didn’t get why would this help? It seems to me register_parameter just register a None parameter to the parameters list. How could model.cuda() affect a None parameter?

ypxie · February 11, 2017, 9:02pm

How about the following?

def cuda(self, device_id=None):
        """Moves all model parameters and buffers to the GPU.

        Arguments:
            device_id (int, optional): if specified, all parameters will be
                copied to that device
        """
        self._cuda = Ture
        self._device_id = device_id
        return self._apply(lambda t: t.cuda(device_id))

def cpu(self, device_id=None):
        """Moves all model parameters and buffers to the CPU."""
        self._cuda = False
        return self._apply(lambda t: t.cpu())

def get_parameter(self, name, shape):
       if not name in self._parameters.keys():
            self._parameters[name] = nn.Parameter(torch.randn(shape))
       var = self._parameters[name]      
       fn = lambda t: t.cuda(self._device_id) if self._cuda else lambda t: t.cpu()
       var.data = fn(var.data)
       if var.grad is not None: 
           var.grad.data = fn(var.grad.data)
       return var

apaszke · February 11, 2017, 9:36pm

model.cuda() won’t affect it, unless it has be reassigned. However, if you call model.cuda() and then forward a CUDA input, input.new will allocate a CUDA tensor, so the types will always match. I find that solution simpler and more robust than what you proposed. Doesn’t it work for you?

apaszke · February 11, 2017, 9:37pm

Also, don’t fiddle with internal fields. It’s not a good idea. They’re subject to change without notice.

ypxie · February 11, 2017, 9:52pm

I just realized input.new(shape) shape is not necessarily equal to input.size(), that makes sense.
Thanks.

Ben_Usman · August 9, 2017, 10:43pm

I wonder how we are supposed to use registered variables in optimizers (e.g. SGD). .parameters() call does not return None parameters, so if you want to register parameter it in advance, you can create optimizer only after first forward pass?

Amir_Rosenfeld · August 17, 2017, 2:40pm

@ypxie: Note you have: self.cuda = Ture instead of True

ypxie · August 26, 2017, 4:25am

you can do a forward pass in the last line of init funciton.

quoniammm · September 24, 2017, 11:36pm

Hello，I am not very good at python now.I am confused at it.What is the meaning of @ input?

quoniammm · September 24, 2017, 11:57pm

I also have another question.If I use register_parameter to register a parameter.This parameter will update automatically when the training step is run.
If I only want to have a parameter that is just dynamic and don’t need to be updated.How can i implement it.For example:

self.title_conv = nn.Sequential(
            nn.Conv1d(),
            nn.ReLU(),
# the kernel_size is changed because the input's length of conv layer is Changeable.
# Therefore, kernel_size was computed in forward function.
# Then i want pass it to maxpool layer.How can I implement it?
            nn.MaxPool1d(kernel_size=)
        )

karlTUM · March 15, 2018, 2:50pm

I also have the same problem, since I would like to adapt different kernel sizes according to different input data sizes. Did you solve your problem?

Dave_Kielpinski · August 13, 2018, 5:52pm

Here’s another way that seems to work. Like your example with register_parameter, a forward pass is required before initializing the optimizer. Do you see any clear advantages or disadvantages?

class DynamicLinear(nn.Module):
    def __init__(self, output_dim):
        super(DynamicLinear, self).__init__()
        self.output_dim = output_dim

    def forward(self, inputs):
        if not hasattr(self, '_linear'):
            input_dim = inputs.shape[-1]
            self._linear = nn.Linear(input_dim, self.output_dim)
        return self._linear(inputs)

OrNot · June 6, 2019, 2:46am

won’t work properly . same issue about model.cuda()

jinfagang · January 15, 2021, 12:12pm

How to set the bias?

I mean F.conv2d using default bias or not? how to make nn.Conv2d dyamic weights in forward and also same effect as calling F.conv2d without bais?

ilanfri · June 22, 2022, 3:59pm

Apologies for reviving this, I’m looking to do precisely this but it seems this no longer works. It seems from the current documentation for register_parameter() that for its second argument, " If None , the parameter is not included in the module’s state_dict.".

I think the above therefore results in the parameter not being found in the .parameters() method when an object is instantiated from this class.

Instead of using None as the second argument to register_parameter() I could use nn.Parameter(), but then then not sure how to check if a parameter is uninitialised, so that I can write the correct if statement in the forward() method to avoid resetting the parameter at each iteration.

What is the up-to-date way to do this?

(Apologies if I am mistaken and have the wrong end of the stick here).