pytorch lets you create the nets in a more dynamic way. With the current code, one needs to define the sub-modules or their weights in the init call of the class and use them later in the forward call. This creates two separate pieces of code that one needs to change if the structure of the net is changed. Ideally I want a single line of code that both declares all needed structures and uses them for computations. This could be easy to implement by creating wrappers around the existing classes and the new code would be easier to read and manage. The idea is very simple and three classes below illustrate it:
# The standard way of doing things now in pytorch examples:
class TestNet(nn.Module):
def __init__(self):
super(TestNet, self).__init__()
self.net1 = nn.Linear(100, 200)
self.net2 = nn.Linear(200, 1)
self.sigmoid = nn.Sigmoid()
self.ReLU = nn.ReLU(inplace=False)
self.drop = nn.Dropout(0.5)
def forward(self, V):
return self.sigmoid(self.net2(self.drop(self.ReLU(self.net1(V))))).squeeze()
what I don’t like in the standard way of doing things is that if I change the net, I need to adjust the init call to match the forward call manually, taking into account the net sizes and structures, which is inconvenient and error prone in case of bigger networks.
So, I create my nets in a more dynamic way, where both the structure and the computation is defined in the same place - the forward call, while passing a sample of data to the init call. I then call the forward from the init so the net is easily created with layers that match the input sizes or perhaps more complex submodules:
class DynamicTestNet(nn.Module):
def __init__(self, datasample):
super(DynamicTestNet, self).__init__()
self.ml = nn.ModuleList()
self.Make = True
self.forward(datasample)
self.Make = False
def forward(self, V):
# notice that this whole block of code that creates the nets can be hidden from the user with small API changes
self.mlindex = 0
if self.Make: # when this is called from the init call, we create the net based on the sizes of the data sample and intermediate reults
self.ml.append(nn.Linear(V.size(1), 200))
self.ml.append(nn.ReLU(inplace=False))
self.ml.append(nn.Dropout(0.5))
net1 = self.ml[self.mlindex]
self.mlindex += 1
ReLU = self.ml[self.mlindex]
self.mlindex += 1
drop = self.ml[self.mlindex]
# end of net creation block
result = drop(ReLU(net1(V))))
#another subnet creation block that can be hidden
if self.Make:
self.ml.append(nn.Linear(result.size(1), 1))
self.ml.append(nn.Sigmoid())
net2 = self.ml[self.mlindex]
self.mlindex += 1
sigmoid = self.ml[self.mlindex]
self.mlindex += 1
# end
return sigmoid(net2(result)).squeeze()
While it looks like this second dynamic net is using more code, it is actually more convenient and manageable way of doing things for big nets that change their structure often and need to adapt to new types of data.
The idea: we could reduce the amount of the extra code, if we had another subclasses/wrappers of nn classes called “autonet” that removes the need for the extra code in the DynamicTestNet class. The “autonet” would do the net memorization/creation/autosizing task for us, so its init call has a new parameter - AutoCreate:
import torch.autonet as ann
class DynamicTestNet(ann.Module):
def __init__(self, datasample):
super(TestNet, self).__init__(AutoCreate = True, datasample) # here it just calls the forward call with the datasample and remembers all of the nets created in the hidden ml list as illustrated in the class above
def forward(self, V):
result = ann.sigmoid(ann.Linear(ann.Dropout(0.5, ann.ReLU(ann.Linear(V, 200)))))
notice that now we have far fewer lines than in the original TestNet class. When the “forward” is called from the “init” call, it would create the hidden modulelist ml as in the dynamic example above, initialize/create each sub-module of the net in such a way as to match the output size of the previous layer and remember them in the internal ml list automatically. In subsequent “forward” calls it would just use the ann.* submodules created before from the hidden ml list.
I think it could make pytorch nets even more readable and easier to manage, while further increasing the main pytorch advantage: the removal of decoupling of the declaration and the computation parts present in Tensorflow, theano, and other frameworks. Pytorch wins in my use cases because it makes a hugely more readable and debuggable code, that runs at the same speed or faster than the compiled/decoupled frameworks.