Detailed explanation on nn creation

tomera · February 4, 2020, 10:10am

Hi, I started to learn PyTorch after doing Machine learning course in Coursera.
Reading the docs I find that Im unfamiliar with the terms and the logic of the nn creation.

For example this code:

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

What is relu? why we need to call it twice? why we need to use x.view after this? why we need to call fc1/2/3? I dont get the intuition of this forward function.
I fond example of a “simple” nn design with a forward function like this which is how I was thought nn is built:

 #activation function ==> S(x) = 1/1+e^(-x)
    def sigmoid(self, x, deriv=False):
        if deriv == True:
            return x * (1 - x)
        return 1 / (1 + np.exp(-x))

    # data will flow through the neural network.
    def feed_forward(self):
        self.hidden = self.sigmoid(np.dot(self.inputs, self.weights))

I also dont understand the init function:

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

I tried to go for the docs but Im still clueless about the logic flow of the init function. Why we need t occall Conv2d and then MaxPool2d and then Linear.

What channels means? why do we need channels? I know the simple design on input layer, hidden layer and output layer.

Capture

The simple code for nn as I learned it is found here:
https://towardsdatascience.com/inroduction-to-neural-networks-in-python-7e0b422e6c24

Does PyTorch have a detailed explanation for each line of the nn design, explaining to a total newbie in PyTorch.

Thanks.

G.M · February 4, 2020, 10:42am

ReLU: activation function; usually, we apply an activation after each layer.
.view: This is for changing the shape of tensors without copying it.
Linear layer: it is also called Fully Connected layer. In Pytorch a FC takes in an input that’s 2D.
fc1/2/3: That’s just variable names.
“Calling maxpool2d after Conv2d”: Take a look at the common structure of a Conv net.
channels: Channels are originally used for describing colors of an image, for example RGB images have 3 channels. Here, “channel” represents a dimension of the tensor, it is usually the second dimension of an input tensor for Conv2d.

Taking a look at this page would probably get u some help.

tomera · February 4, 2020, 5:17pm

Thanks for answering, Im afraid I still dont understand enough, the docs are not detailed enough for me.
I looked in the NN tutorial:

def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

Im still not sure what Conv2d does. From the docs:

Applies a 2D convolution over an input signal composed of several input planes.

Conv2d takes high dimension and put it in a 2D matrix? but why we want do it? and why the choosed the input to be (6, 16, 3) in the second line of self.conv2 = nn.Conv2d(6, 16, 3). In addition what output channel means for Conv2d? Why we need to define here output channel while we already do it in the nn.Linear

In this code lines:

        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

Why they choose the hidden layer units to be with 120 and 84 units?

In the forward function:

def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Im not sure why we need the max_pool2d function, from the docs:

Applies a 2D max pooling over an input signal composed of several input planes.

What is it max pooling and why do we need it?

Thanks.