Hi! I’m looking for guidance on defining a network with modules inside functions
Beginner tutorials say says to do (1):
class network(torch.nn.Module)
and then self.network = torch.nn.Sequential(
layer 1
layer 2)
def forward(x)
return self.network(x)
# and to train this model, do
x = network(x)
# and then loss.backward ect.
But I found another way (2) on Github
def calculate(x)
# either:
return torch.nn.somelayer()(x)
# or:
return torch.nn.functional.thatlayer(x)
# and then
x = calculate(x)
# and the loss.backward stuff
In this simple example, are (1) and (2) doing the same thing? Is (2) going to train and learn just like (1)
In particular, I’m worried/wondering that if a layer L doesn’t the ‘keep layers in a super class of torch.nn.module’ and ‘thatsuperclass.forward(x),’ it won’t be part of the network (i.e. the parameters of L will only used as a calculator but won’t get trained/updated
It depends on the used module. If it’s stateless (i.e. the module does not contain trainable parameters or buffers such as nn.ReLU()) both approaches would result in the same output even though creating the module before calling it via nn.ReLU()(x) looks wasteful so use the functional API instead via F.relu(x).
However, if the modules contain parameters and buffers you would either recreate the module in the second approach via: nn.Linear(10, 10)(x) and would thus never train it, which is wrong, or you would need to explicitly pass registered parameters and buffers to the functional API via F.linear(x, weight, bias).
Thanks very much for your help ptrblck! I was reading up on the documentations
To to make sure I understand, is there any chance you could review my understanding of the following?
My motivation is looking at this PyTorch implementation of MAML algorithm, where the author writes something like this:
# layers.py helper file
def conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
return F.conv2d(input, weight.cuda(), bias.cuda(), stride, padding, dilation, groups)
# omnigolot.py network object file
from layers import conv2d
class Omnigolot(torch.nn.Module):
# something something
def __init__():
self.network = torch.nn.Sequential([ "something" ])
def forward(self, x, weights):
x = self.network(x) # if there aren't any weights
x = conv2d(x, weights) # if there are weights
So in this case, when we do loss.backward():
result = self.network(x) # <- parameters in the layers are being updated (there's learning going on)
result = conv2d(x, weights) # <- this is just a one-time calculation. The layer just holds the weights to do calculate 'result'
Yes, your interpretation is not wrong, but let me clarify a few things.
result = self.network(x) # <- parameters in the layers are being updated (there's learning going on)
Using the self.network module does not automatically guarantee its parameters will be updated.
You would still need to calculate the gradients w.r.t. the used parameters and pass these parameters also to an optimizer which can then update them in its optimizer.step() call.
The nn.Module just holds the parameters and uses them in its forward method in the same way your custom Omnigolot module creates submodules, parameters, etc. and uses these in its forward.
result = conv2d(x, weights) # <- this is just a one-time calculation. The layer just holds the weights to do calculate 'result'
conv2d is not a “layer” but just a function in your example. The weights seem to be passed to it and you would need to make sure they were passed to an optimizer and will be updated.
I’m grateful for your detailed reply and explanation! I will read the tutorial to learn more.
Also, I want to say that your replies to many other questions on this forum are really helpful as well; they unstuck-ed me many times