I'm trying to understand LayerNorm

John_Deatherage · March 21, 2023, 11:01pm

I’m trying to wrap my head around how to use nn.LayerNorm(). As I understand it, Layer Normalization takes the weights of a hidden layer and rescales them around the mean and standard deviation. Correct so far?

For example, let’s assume a simple plain vanilla feed-forward network.
def init(self, input_size, neurons, num_classes):
super(NeuralNet, self).init()
self.fc1 = nn.Linear(25, 10, bias=True) # input layer with 25 features
self.fc2 = nn.Linear(10, 10, bias=True) # hidden layer with 10 neurons
self.fc3 = nn.Linear(10, 2, bias=True) # output layer with 2 outputs

    def forward(self, x):
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))   
    return torch.sigmoid(self.fc3(x))

I want to add nn.LayerNorm to the layers (fc1 & fc2). Should I add it into the forward method?

    def forward(self, x):
    x = F.relu(self.fc1(x))
    x = self.fc1.nn.LayerNorm(?????)
    x = F.relu(self.fc2(x))  
    x = self.fc2.nn.LayerNorm(?????) 
    return torch.sigmoid(self.fc3(x))

What I don’t understand are the arguments/parameters needed… I’ve read the documentation: torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None ) Using my example, what is the normalized_shape for fc1? for fc2?

be gentle, I’m a newbie THANK YOU for taking the time to read this and to HELP ME!