Customizing both the network architecture and the units- difficulty in feeding parameters to torch.optim

Hi all,

I’m very new to PyTorch but I’m trying to implement a network which is a bit tricky.

Concretely speaking, the network has these four particularities:

First, it consists of several similar units, such that each unit takes the input and provides two outputs: one to be fed to the next unit and the other one directly fed to the final loss-function of the network. This means that I cannot use the standard nn.Sequntial to define my network architecture.

Second, it has variable number of units. So I want to construct the network in a for-loop somewhere.

Third, each unit consists of operations not in the nn class, like e.g., nn.Linear or nn.Relu. So I have to define also the units myself.

Forth, I need to initialize my parameters from some other functions. So all the initial parameters of all the units are gathered in e.g. a list and fed to the network in one place.

Suppose I want to use torch.optim. So I should find a good way to declare my gradient-requiring variables to it. But this is where things are becoming problematic for me.

The way I see this network to be implemented is to define one class for the units that inherits from nn.Module and one class for the the whole network that connects the units together.

My current code looks like this:

import torch
from torch.nn.parameter import Parameter
from torch.autograd import Variable
from torch import nn
#####

myParametersList = [torch.randn(2,2),
                    torch.randn(2,2),
                    torch.randn(2,2)]
input = Variable(torch.randn(2,2))
########
class myUnit(nn.Module):
    """
    Defines a generic unit of the network
    """
    def __init__(self,myParameter):
        super(myUnit, self).__init__()
        self.myParameter = Parameter(myParameter,requires_grad=True)
    def forward(self,input):
        """
        Whatever operation. Just an example.
        """
        output_1 = self.myParameter * input - 1
        output_2 = output_1 - output_1.mean()
        return output_1,output_2
#######
class myNetwork(nn.Module):
    """
    Uses myUnit class to build-up the network.
    """
    def __init__(self,myParametersList,numUnits):
        super(myNetwork, self).__init__()
        self.myParametersList = myParametersList
        self.numUnits = numUnits
        assert numUnits == len(myParametersList)
    def forward(self,input):
        output_final = Variable(torch.zeros(2,2))
        for u in range(self.numUnits):
            myParameter = self.myParametersList[u]
            unitObj = myUnit(myParameter)
            output_1, output_2 = unitObj.forward(input)
            input = output_1.clone()  # to be fed to the next unit
            output_final.add_(output_2)

            return output_final

##################
#myModel = myUnit(myParametersList[0])
myModel = myNetwork(myParametersList,3)
myModel.forward(input) # I need this to create the list of my parameters.

optimizer = torch.optim.Adam(myModel.parameters(), lr=1e-2)


for t in range(50):
    output_final = myModel.forward(input)[0]

    loss = (input - output_final).pow(2).mean()
    print(t, loss.data[0])

    optimizer.zero_grad()

    loss.backward()
    optimizer.step

But I get this error about the list of parameters.

ValueError: optimizer got an empty parameter list

However, when I build the model only from one single myUnit, I don’t get this error anymore.

I am also aware of this post, but that didn’t help me.

Any thoughts? I’d appreciate a lot!

The problem is that the params contained in self.myParametersList are not recognised as being params of myNetwork. There are several possible solutions…

The nicest solution is to have myNetwork.__init__ initialise the list of myUnit submodules.
This is the cleanest solution and, I imagine, the one that most people would prefer.
This way the myNetwork instance has a list of registered submodules thanks to nn.ModuleList and each myUnit instance has its own registered parameter tensor, and PyTorch automatically figures out what parameters need updating.

class myUnit(nn.Module):
    def __init__(self):
        super(myUnit, self).__init__()
        self.myParameter = Parameter(torch.randn(2,2), requires_grad=True)

    def forward(self, input):
        # same as your code

class myNetwork(nn.Module):
    def __init__(self, numUnits):
        super(myNetwork, self).__init__()
        self.mySubmodulesList = nn.ModuleList([myUnit() for _ in range(numUnits)])

    def forward(self, input):
        output_final = Variable(torch.zeros(2,2))
        for u in range(len(self.mySubmodulesList)):
            output_1, output_2 = self.mySubmodulesList[u](input)
            input = output_1 # cloning output_1 seems unnecessary as output_1 is never used elsewhere.
            output_final = output_final + output_2 # I am not certain that an inplace add_ will not cause errors during the backward pass
            return output_final

A hackier solution would involve looping through the list of parameters adding each one explicitly using register_parameter.

self.myParametersList = myParametersList
for i, myParam in enumerate(self.myParametersList):
    self.register_parameter("myParam"+str(i), myParam)
2 Likes

@jpeg729
Problem solved! Thanks very much!

I used the first solution that you proposed where this np.ModuleList does the trick.