# Detailed explanation on nn creation

Hi, I started to learn PyTorch after doing Machine learning course in Coursera.
Reading the docs I find that Im unfamiliar with the terms and the logic of the nn creation.

For example this code:

``````    def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
``````

What is relu? why we need to call it twice? why we need to use x.view after this? why we need to call fc1/2/3? I dont get the intuition of this forward function.
I fond example of a “simple” nn design with a forward function like this which is how I was thought nn is built:

`````` #activation function ==> S(x) = 1/1+e^(-x)
def sigmoid(self, x, deriv=False):
if deriv == True:
return x * (1 - x)
return 1 / (1 + np.exp(-x))

# data will flow through the neural network.
def feed_forward(self):
self.hidden = self.sigmoid(np.dot(self.inputs, self.weights))
``````

I also dont understand the init function:

``````    def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
``````

I tried to go for the docs but Im still clueless about the logic flow of the init function. Why we need t occall Conv2d and then MaxPool2d and then Linear.

What channels means? why do we need channels? I know the simple design on input layer, hidden layer and output layer.

The simple code for nn as I learned it is found here:

Does PyTorch have a detailed explanation for each line of the nn design, explaining to a total newbie in PyTorch.

Thanks.

ReLU: activation function; usually, we apply an activation after each layer.
`.view`: This is for changing the shape of tensors without copying it.
Linear layer: it is also called Fully Connected layer. In Pytorch a FC takes in an input that’s 2D.
fc1/2/3: That’s just variable names.
“Calling maxpool2d after Conv2d”: Take a look at the common structure of a Conv net.
channels: Channels are originally used for describing colors of an image, for example RGB images have 3 channels. Here, “channel” represents a dimension of the tensor, it is usually the second dimension of an input tensor for Conv2d.

Taking a look at this page would probably get u some help.

Thanks for answering, Im afraid I still dont understand enough, the docs are not detailed enough for me.
I looked in the NN tutorial:

``````def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
``````

Im still not sure what Conv2d does. From the docs:

Applies a 2D convolution over an input signal composed of several input planes.

Conv2d takes high dimension and put it in a 2D matrix? but why we want do it? and why the choosed the input to be (6, 16, 3) in the second line of `self.conv2 = nn.Conv2d(6, 16, 3)`. In addition what output channel means for Conv2d? Why we need to define here output channel while we already do it in the `nn.Linear`

In this code lines:

``````        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
``````

Why they choose the hidden layer units to be with 120 and 84 units?

In the forward function:

``````def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
``````

Im not sure why we need the max_pool2d function, from the docs:

Applies a 2D max pooling over an input signal composed of several input planes.

What is it max pooling and why do we need it?

Thanks.