Hi, I was wondering whether it could be useful or harmful to apply batch normalization directly on the input of a nn. Example of a simple network:
class MyNet(nn.Module): def __init__(self): super().__init__() self.bn0 = nn.BatchNorm1d(128) self.fc1 = nn.Linear(128, 4096) self.bn1 = nn.BatchNorm1d(4096) self.fc2 = nn.Linear(4096, 4096) self.bn2 = nn.BatchNorm1d(4096) self.fc3 = nn.Linear(4096, 10) self.relu = nn.ReLU() def forward(self, x): normalized_input = self.bn0(x) h = self.relu(self.bn1(self.fc1(normalized_input))) h = self.relu(self.bn2(self.fc2(h))) h = self.fc3(h) return h
Suppose I have a huge trainset whose mean and std are unknown (for any possible reason), so I cannot normalize the dataset. By applying a batch norm even before feeding data to the first layer, I should be able to normalize my data anyway. To me it sounds plausible, but I haven’t ever seen a network such that. What are the possible implications of such an approach?