I’m trying to wrap my head around how to use nn.LayerNorm(). As I understand it, Layer Normalization takes the weights of a hidden layer and rescales them around the mean and standard deviation. Correct so far?
For example, let’s assume a simple plain vanilla feed-forward network.
def init(self, input_size, neurons, num_classes):
super(NeuralNet, self).init()
self.fc1 = nn.Linear(25, 10, bias=True) # input layer with 25 features
self.fc2 = nn.Linear(10, 10, bias=True) # hidden layer with 10 neurons
self.fc3 = nn.Linear(10, 2, bias=True) # output layer with 2 outputs
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return torch.sigmoid(self.fc3(x))
I want to add nn.LayerNorm to the layers (fc1 & fc2). Should I add it into the forward method?
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc1.nn.LayerNorm(?????)
x = F.relu(self.fc2(x))
x = self.fc2.nn.LayerNorm(?????)
return torch.sigmoid(self.fc3(x))
What I don’t understand are the arguments/parameters needed… I’ve read the documentation: torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None ) Using my example, what is the normalized_shape for fc1? for fc2?
be gentle, I’m a newbie THANK YOU for taking the time to read this and to HELP ME!