Can I instantiate layers in forward function?

Normally we use pytorch like the following code.

class MyNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1 = nn.Conv2d(3, 32, 3)
    self.layer2 = nn.Conv2d(32, 64, 3)
    self.layer3 = nn.Conv2d(64, 2, 3)
  def forward(self, x):
    out = self.layer1(x)
    out = self.layer2(out)
    out = self.layer3(out)
    return out

My question is:
Do I have to instantiate layers in init() function? Are the following 2 code segments also okay?
Code_1

class MyNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1 = nn.Conv2d(3, 32, 3)

  def forward(self, x):
    out = self.layer1(x)

    self.layer2 = nn.Conv2d(32, 64, 3)
    self.layer3 = nn.Conv2d(64, 2, 3)
    out = self.layer2(out)
    out = self.layer3(out)
    return out

Code_2

layer2 = nn.Conv2d(32, 64, 3)
layer3 = nn.Conv2d(64, 2, 3)

class MyNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1 = nn.Conv2d(3, 32, 3)

  def forward(self, x):
    out = self.layer1(x)

    out = layer2(out)
    out = layer3(out)
    return out

Pytorch uses that template to instantiate layers to keep track of functions and their derivatives in forward and backward pass. You can check out how those inner-workings come together in the source code. Also look into “tensorboard” to visualize the graph that you build and how it is updated in each forward and backward pass.
For example in your “Code_1”, you have instantiated 2 of the convolution layers in the forward method. This results in generating 2 new convolution layers in each forward pass. Therefore losing track of parameters you intend to update. These update rules and derivations are defined in the nn.Module class that we inherit from each time we create a new class.

In the secod case, The layers are not defined within the class, thus they don’t inherit from the nn.Module class. So when you create your optimizer and pass the network parameters to be updated: optimizer = Adam(MyNet.parameters(), learning_rate=0.001), those weights for convolutions outside the network don’t get passed to the optimizer.