Application of activation function (not yet finally answered)

Edit: Current question in my last post down below!

Is this function correct or does it apply
the relu to the input layer and is therefore wrong?

def forward(self, input):
x = self.fc1(input)
x = nn.functional.relu(x)
x = self.fc2(x)
x = nn.functional.relu(x)
x = self.fc3(x)
output = nn.functional.relu(x)
return output

Sry if this is too basic but I’m still new to machine learning.
An advice would be much appreciated. Gomen!

input part is ok, but final relu may be bad - you won’t produce negative outputs, needed/adequate for some losses/targets.

Thank you very much.
I think what confuses me is the concept of layer. When I implement the first layer of neurons I set a number for it’s inputs and it’s outputs. So is the input layer only the first part of it and the outputs belong already to the hidden layer and not the input layer anymore? That’s the point that made me think I might applied the activation function to the input layer.

Edit: So my code example above has in fact two hidden layers not one am I right? The self.fc() just implements the connection between two layers?

You have 3 trainable transformations (fcN), if this function defines a whole network, you may call them input layer, hidden layer and output layer, otherwise you may call this a block to disambiguate. You can set fc1,fc2 output sizes independently and arbitrary as they correspond to internal data representations. Two relu activations in between can be viewed as separators that provide non-linearity.

Ok, thank you. To come back to my initial question with the activation function,
if fc1 is indeed my input layer why is the application of relu still fine? I’ve read that input layers don’t have an activation function only hidden and output layers do. I hope it’s clear now what my point of confusion is.

Edit: So basically this is it of what I don’t get why it still should be fine:

x= self.fc1(input)
x = nn.functional.relu(x)

That would be the case if the author logically groups fc transformations with preceding activations; there is some abiguity about treating activations as belonging to one of its neighbours.
Basically, rule about input is - don’t distort input by activations before you apply the first trainable transformation.

Great thanks to you, this gave me clearity! :+1: